MongoDB Series - In-depth understanding of MongoDB Aggregation

The aggregate operation in MongoDB combines the values from multiple documents together, conducts various operations on grouped data, and returns the calculated data results, which are mainly used for data processing (such as statistical average, sum, etc.). MongoDB provides three methods to perform aggregation: aggregation pipeline, map-reduce function, and single aggregation command (count, distinct, and Group).

1. Aggregation Pipeline

1.1 Polymerization pipeline

The aggregation pipeline is a document that the Aggregation framework enters into a pipeline consisting of multiple stages. The pipeline of each stage can be grouped and filtered, and then the corresponding aggregation result can be output through a series of processes. As shown in the figure:

Polymerization pipeline operation:

db.orders.aggregate([
      { $match: { status: "A" } },
      { $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])
Copy the code

$match stage: filter the qualified Document through the status field (i.e., the Document whose status equals “A”);
** $GROUP stage: Group Document by cust_ID field to calculate the sum of each unique CUST_id. 支那

1.2 the pipe

Pipelines in Unix and Linux are generally used to take the output result of the current command as the parameter of the next command, and MongoDB’s aggregate pipeline passes the MongoDB document to the next pipeline after the completion of one pipeline. Pipeline operations can be repeated. The most basic plumbing functionality provides filters that operate like queries and document transformations to modify the form of the output document. Other pipeline operations provide tools for grouping and sorting documents by specific fields or fields, as well as tools for aggregating the contents of arrays, including arrays of documents. In addition, the pipeline phase can use operators to perform tasks, such as calculating averages or connection strings. The summary is as follows:

Pipe operator

Commonly used pipe	parsing
$group	Group the documents in the collection so that you can count the results
$match	Filter the data and output only documents that match the results
$project	Modify the structure of the input document (such as renaming, adding, deleting fields, creating settlement results, etc.)
$sort	Sort the results and print them
$limit	Limit the number of pipe outputs
$skip	Skip the specified number of results and return the remaining results
$unwind	Splits fields of array type

Expression operator

Common expressions	meaning
$sum	$sum: {$sum: 1} = {$sum: 1} = {$sum: 1
$avg	averaging
$min	Ask min value
$max	Get the Max value
$push	Inserts values from the resulting document into an array
$first	Get the first document data according to the order of the documents
$last	Similarly, get the last data

For ease of understanding, compare common mongo aggregation operations with MySql queries:

MongoDB aggregation	MySql operations/functions
$match	where
$group	group by
$match	having
$project	select
$sort	order by
$limit	limit
$sum	sum()
$lookup	join

1.3 Aggregation Pipeline optimization

The aggregation pipe can determine if it only needs a subset of the fields in the document to get results. If so, the pipe will use only those required fields, reducing the amount of data passing through the pipe
Pipeline sequence optimization

Pipeline sequence optimization: 1). $projector/$addFields+$match sequence optimization: When there are multiple $Projectior /$addFields and $match stages in the Aggregation Pipeline, the dependent $projector/$addFields stage will be executed first, and then the newly created $match stage will be executed as follows:

    { $addFields: {
    maxTime: { $max: "$times" },
    minTime: { $min: "$times" }
     } },
    { $project: {
    _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
    avgTime: { $avg: ["$maxTime", "$minTime"] }
     } },
     { $match: {
    name: "Joe Schmoe",
    maxTime: { $lt: 20 },
    minTime: { $gt: 5 },
    avgTime: { $gt: 7 }
    } }
Copy the code

Optimized execution:

    { $match: { name: "Joe Schmoe" } },
      { $addFields: {
      maxTime: { $max: "$times" },
     minTime: { $min: "$times" }
    } },
    { $match: { maxTime: { $lt: 20 }, minTime: { $gt: 5 } } },
    { $project: {
       _id: 1, name: 1, times: 1, maxTime: 1, minTime: 1,
      avgTime: { $avg: ["$maxTime", "$minTime"] }
    } },
    { $match: { avgTime: { $gt: 7 } } }
Copy the code

$sort = $match; $skip = $sort; $sort = $sort; $sort to minimize the number of objects to be sorted,$skip constraint, as follows:

  { $sort: { age : -1 } },
  { $match: { score: 'A' } }
  { $project: { status: 1, name: 1 } },
  { $skip: 5 }
Copy the code

Optimized execution:

    { $match: { score: 'A' } },
    { $sort: { age : -1 } }
    { $skip: 5 },
    { $project: { status: 1, name: 1 } }
Copy the code

3). $redact+$match sequence optimization, when $redact after $match, may create a new $match stage for optimization, as follows:

{ $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "? PRUNE", else: "? DESCEND" } } }, { $match: { year: 2014, category: { $ne: "Z" } } }Copy the code

Optimized execution:

{ $match: { year: 2014 } }, { $redact: { $cond: { if: { $eq: [ "$level", 5 ] }, then: "? PRUNE", else: "? DESCEND" } } }, { $match: { year: 2014, category: { $ne: "Z" } } }Copy the code

There are many other Pipeline sequence optimizations that can be found in the official document Aggregation Pipeline Optimization.

1.4 Aggregation Pipeline以及分片（Sharded）collections

If the pipe starts with the $match exact shard key, all pipes will run on the matching shard. Aggregation operations that need to be run in multiple shards and do not need to be performed in the primary shard are routed to the random shard to merge the results, avoiding reloading the database of the primary shard. The $OUT and $LOOK phases must run on the master shard database.

2. The Map – the Reduce function

MongoDB also provides map-reduce operations to perform aggregation. In general, a Map-reduce operation has two phases: a Map phase, which processes each document and issues one or more objects for each input document, and a Reduce phase, which combines the output of a Map operation. Optionally, Map-Reduce can have a finalization phase to make final changes to the results. As with other aggregation operations, Map-Reduce can specify query criteria to select input documents and to sort and restrict results.

Map-reduce uses custom JavaScript functions to perform mapping and reduction operations, as well as the optional Finalize operation. While custom JavaScript offers great flexibility compared to aggregate pipes, map-Reduce is generally less efficient and more complex than aggregate pipes. The mode is as follows:

3. Single aggregation command

Mongo also provides, db. Collection. EstimatedDocumentCount (), the collection. The count () and the collection. The distinct () all of these single polymerization command. While these operations provide easy access to common aggregation processes, they lack the flexibility and functionality of aggregation pipes and Map-Reduce. Model as follows

conclusion

The aggregation operation in MongoDB can be used for data processing and can be adapted to some data analysis, etc. The typical application of aggregation includes business reports of sales data, such as calculating total sales and financial statements after grouping data of different regions. Finally, to have a deeper understanding, you need to practice.

Finally, you can pay attention to the public number, learning together, every day will share dry goods, and learning video dry goods!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

MongoDB Series — In-depth understanding of MongoDB Aggregation

1. Aggregation Pipeline

1.1 Polymerization pipeline

1.2 the pipe

1.3 Aggregation Pipeline optimization

1.4 Aggregation Pipeline以及分片（Sharded）collections

2. The Map – the Reduce function

3. Single aggregation command

conclusion

MongoDB Series — In-depth understanding of MongoDB Aggregation

1. Aggregation Pipeline

1.1 Polymerization pipeline

1.2 the pipe

1.3 Aggregation Pipeline optimization

1.4 Aggregation Pipeline以及分片（Sharded）collections

2. The Map – the Reduce function

3. Single aggregation command

conclusion

Related Posts

Hive Transactions (23)

Daily Python, Article 20, exception handling

Vertx Resolves the process of creating a TCP server