Mongo DB Aggregation Framework and Map Reduced

saurabh kharkate
4 min readAug 31, 2021

Hello folks!!! here I again came up with an another article about the MongoDB Aggregation Framework and Mapped Reduce operations. hope this will help you in future projects.

MongoDB

The first Database comes in mind when you work on the schema less and unstructured data which manipulate the shape of data quickly and efficiently is MongoDB. For working in this type of data MongoDB comes up with powerful framework which is Aggregation Framework helps to manipulated and process data records effectively itself on server and resulted computed data.

Aggregation Framework

The MongoDB Aggregation Framework is the way to query the data from MongoDB. It helps us to break the complex logics into a simple set of sequential operations. This framework exists because when you start working with and manipulating data, you often need to crunch collections together, modify them, pluck out fields, rename fields, concatenate them together, group documents by field in different documents and so on.

The simple query set in MongoDB only allows you to retrieve full or parts of individual documents. They don’t really allow you to manipulate the documents on the server and then return them to your application. Form this is where the aggregation framework from MongoDB comes in place.

What is Pipeline in Aggregation Framework ?

The Aggregation Framework works on the concept of data processing pipeline. The pipeline consists of certain stages where certain operators modify the documents in the collection using various techniques. Finally, the output is returned to the application calling the query.

The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as they pass through the pipeline. Pipeline stages do not need to produce one output document for every input document. For example, some stages may generate new documents or filter out documents.

Another way we can perform aggregation with the MapReduce Operation.

What is MapReduce?

MapReduce is a programming paradigm that works on a big data over distributed system. It analysis data and produce aggregated results. Key / values pairs have declared in the map function which we use this values to accumulate data. Later in reduce function we use this accumulated data, accumulated in the map function, to convert them into the aggregated results.

Let’s Perform Some task using MapReduce first

Here Perform the map-reduce operation on the orders collection to group by the cust_id , and calculate the sum of the price for each cust_id:

Declaring Map variable :

  • Define the map function to process each input document
  • The below function maps the price to the cust_id for each document and emits the cust_id and price.
> var mapFunction1 = function() {
emit(this.cust_id , this.price);
};

Declaring Reduce variable

  • Define the corresponding reduce function with two arguments keyCustId and values Prices:
> var reduceFunction1 = function( keyCustId, valuesPrices){
return Array.sum(valuesPrices);
};

Using Map Reduce Function :

Perform map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function:

> db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)

Let’s do the same task using the Aggregation framework -

  • Using the available aggregation pipeline operators, you can rewrite the map-reduce operation without defining custom functions:
  • The above code I have first grouped all the documents on the cust_id basis then found the sum of their values. In the next step I have saved the results in the “agg_alternative_1” collection.
  • To see the collection use below command.
db.agg_aleternative_1.find().sort({ _id: 1 })
  • An aggregation pipeline provides better performance and usability than a map-reduce operation.

☘☘ Keep Sharing!!! , Keep Learning!!! ☘☘

🙏🙏Thanks for Reading 🙏🙏

--

--