Aggregation in information science refers to the process of content screening, processing and classification of relevant data and output results. Aggregation in MongoDB is the process of processing, filtering, categorizing, and output data from multiple documents at the same time. In the process of aggregation operation, data is just like water flowing through a section of pipe, so the aggregation in MongoDB is also called streaming aggregation.

MongoDB provides several aggregations:

Aggregation Pipeline
Map-Reduce
Simple aggregation

Next, we’ll take a full look at aggregation in MongoDB.

Aggregation Pipeline

Aggregation Pipeline is also called Aggregation Pipeline. A developer can pass multiple documents into a Pipeline consisting of multiple stages. The result of each Stage processing is passed into the next Stage, and the result of the last Stage processing is the output of the entire Pipeline.

The syntax for creating an aggregate pipe is as follows:

db.collection.aggregate( [ { <stage> }, ... ] )
Copy the code

MongoDB provides 23 stages, which are:

Stage	describe
`$addFields`	Add new fields to the document.
`$bucket`	Groups incoming documents based on the specified expression and storage boundaries.
`$bucketAuto`	Automatically determine storage boundaries by classifying incoming documents into a specific number of groups based on the specified expression.
`$collStats`	Returns statistics about a collection or view.
`$count`	Returns a count of the number of documents at this stage of the aggregation pipeline.
`$facet`	Multiple aggregation operations are handled within a single stage of the same set of input documents.
`$geoNear`	Returns an ordered document flow based on proximity to a geospatial point.
`$graphLookup`	Perform a recursive search on the collection.
`$group`	Groups documents by the specified identifier expression.
`$indexStats`	Returns index information for the collection.
`$limit`	Pass the first n unmodified documents to the pipe.
`$listSessions`	list`system.sessions`All sessions of the collection.
`$lookup`	Perform a left outer join on another collection in the same database.
`$match`	Filter documents, allowing only matching documents to be passed to the next pipeline phase.
`$out`	Writes the result document of the aggregate pipe to the specified collection, which must be the last stage in the pipe.
`$project`	Add new fields to the document or delete existing fields.
`$redact`	Can be used to implement field-level editing.
`$replaceRoot`	Replaces the document with the specified embedded document. This action replaces all existing fields in the input document, including`_id`Field. Specifies the document embedded in the input document to promote the embedded document to the top level.
`$sample`	Select a specified number of documents at random from the input.
`$skip`	Skip the first n documents and pass the remaining unmodified documents to the next stage.
`$sort`	Reorder the document flow by the specified sort key. Only orders change; The file remains the same. For each input document, one is output.
`$sortByCount`	Group the incoming documents, and then count the documents in each different group.
`$unwind`	Deconstruct the array fields in the document.

The relationship between document, Stage, and Pipeline is shown below:

$match
$sample
$project
Stage
WHERE
SUM
COUNT
Stage

SQL	MongoDB
WHERE	`$match`
GROUP BY	`$group`
HAVING	`$match`
SELECT	`$project`
ORDER BY	`$sort`
LIMIT	`$limit`
SUM()	`$sum`
COUNT()	`$sum$sortByCount`
join	`$lookup`

Below, we’ll look at the relationship between Aggregate, stages, and Pipeline through examples.

The concept of shallow

$match is described as “filtering documents, allowing only matching documents to be passed to the next pipeline phase.” The syntax is as follows:

{ $match: { <query> } }
Copy the code

Before we begin, we need to prepare the following data:

> db.artic.insertMany([
... { "_id" : 1, "author" : "dave"."score" : 80, "views": 100},... {"_id": 2."author" : "dave"."score" : 85, "views": 521},... {"_id": 3."author" : "anna"."score": 60."views": 706},... {"_id": 4."author" : "line"."score": 55."views": 300}... ] )Copy the code

We then set up a single-stage Pipeline to filter out documents with author Dave. The following is an example:

> db.artic.aggregate([
... {$match: {author: "dave"}}
... ])
{ "_id" : 1, "author" : "dave"."score" : 80, "views": 100} {"_id": 2."author" : "dave"."score" : 85, "views": 521}Copy the code

If you want to build a Pipeline with two stages, add a Stage to aggregate. There is a requirement to count the number of documents in the collection ARtic with a score greater than 70 and less than 90. This requirement takes place in two steps:

Filter out documents that meet the requirements
Count documents

Aggregation is ideal for this multi-step operation. In this scenario, we need to use the two stages $match and $group, and then combine them with the aggregate expression $sum.

> db.artic.aggregate([
... {$match: {score: {$gt: 70, $lt: 90}}},... {$group: {_id: null, number: {$sum: 1}}}
... ])
{ "_id" : null, "number"2} :Copy the code

The complete process for this example can be represented as follows:

Aggregate
Stage
Pipeline
Stage

Common Stage

sample

$sample selects a specified number of documents at random from the input. The syntax is as follows:

{ $sample: { size: <positive integer>}}Copy the code

Suppose two documents are randomly selected from the collection artic, as shown in the following example:

> db.artic.aggregate([
... {$sample: {size: 2}} ... ] ) {"_id" : 1, "author" : "dave"."score" : 80, "views": 100} {"_id": 3."author" : "anna"."score": 60."views": 706}Copy the code

Size argument to $sample must not be negative. Note that when the value exceeds the number of documents in the collection, all documents in the collection are returned, but in a random order.

project

$project filters the fields in the document, which is similar to the projection operation, but the processing results are passed to the next stage. The syntax is as follows:

{ $project: { <specification(s)> } }
Copy the code

Prepare the following data:

> db.projects.save(
	{_id: 1, title: "Basketball camp youth school activities begin.", numb: "A829Sck23", author: {last: "quinn", first: "James"}, hot: 35}
)
Copy the code

Assume that the next Stage in the Pipeline only needs the title and author fields in the document, as shown in the following example:

> db.projects.aggregate([{$project: {title: 1, author: 1}}])
{ "_id" : 1, "title" : "Basketball camp youth school activities begin."."author" : { "last" : "quinn"."first" : "James"}}Copy the code

0 and 1 can exist together. The following is an example:

> db.projects.aggregate([{$project: {title: 1, author: 1, _id: 0}}])
{ "title" : "Basketball camp youth school activities begin."."author" : { "last" : "quinn"."first" : "James"}}Copy the code

True is equivalent to 1, false is equivalent to 0, and booleans and numbers can be mixed as follows:

> db.projects.aggregate([{$project: {title: 1, author: true, _id: false}}])
{ "title" : "Basketball camp youth school activities begin."."author" : { "last" : "quinn"."first" : "James"}}Copy the code

If you want to exclude a specified field, set it to 0 or false in $project as shown in the following example:

> db.projects.aggregate([{$project: {author: false, _id: false}}])
{ "title" : "Basketball camp youth school activities begin."."numb" : "A829Sck23"."hot"35} :Copy the code

$project can also work on embedded documents. For the author field, sometimes we just need FirstName or Lastname, as shown in the following example:

> db.projects.aggregate([{$project: {author: {"last": false}, _id: false, numb: 0}}])
{ "title" : "Basketball camp youth school activities begin."."author" : { "first" : "James" }, "hot"35} :Copy the code

Here we use {author: {“last”: false}} to filter out LastName, but keep first.

The above is the basic usage and function introduction of $project. For more knowledge related to $project, please refer to the official document $project.

lookup

$lookup (); $lookup (); $lookup ();

{
   $lookup:
     {
       from: <collection to join>,
       localField: <field from the input documents>,
       foreignField: <field from the documents of the "from" collection>,
       as: <output array field>
     }
}
Copy the code

The left outer join is similar to the following pseudo-SQL statement:

SELECT *, <output array field>
FROM collection WHERE <output array field> IN (
SELECT * FROM <collection to join> WHERE 
<foreignField>= <collection.localField>);
Copy the code

The commands supported by the LOOKUP and their descriptions are as follows:

field	describe
`from`	Specify the collection name.
`localField`	Specify the input`$lookup`Field in.
`foreignField`	The specified`from`Document field in the given collection.
`as`	Specifies the name of the new array field to add to the input document. The new array field contains`from`Matching documents in the collection. If the specified name already exists in the input document, the existing field is overwritten.

Prepare the following data:

> db.sav.insert([
   { "_id" : 1, "item" : "almonds"."price": 12."quantity": 2}, {"_id": 2."item" : "pecans"."price": 20."quantity": 1}, {"_id" : 3  }
])

> db.avi.insert([
   { "_id" : 1, "sku" : "almonds", description: "product 1"."instock": 120}, {"_id": 2."sku" : "bread", description: "product 2"."instock": 80}, {"_id": 3."sku" : "cashews", description: "product 3"."instock": 60}, {"_id": 4."sku" : "pecans", description: "product 4"."instock": 70}, {"_id" : 5, "sku": null, description: "Incomplete" },
   { "_id": 6}])Copy the code

Suppose you want to concatenate item in the collection SAV and SKU in the collection AVI, and name the result of the concatenation savi. The following is an example:

> db.sav.aggregate([
   {
     $lookup:
       {
         from: "avi".localField: "item",
         foreignField: "sku",
         as: "savi"}}])Copy the code

After the command is executed, the following information is displayed:

{
   "_id" : 1,
   "item" : "almonds"."price": 12."quantity": 2."savi": [{"_id" : 1, "sku" : "almonds"."description" : "product 1"."instock": 120}]} {"_id": 2."item" : "pecans"."price": 20."quantity" : 1,
   "savi": [{"_id": 4."sku" : "pecans"."description" : "product 4"."instock": 70}]} {"_id": 3."savi": [{"_id" : 5, "sku" : null, "description" : "Incomplete" },
      { "_id": 6}]}Copy the code

The above join operation is equivalent to the following pseudo-SQL:

SELECT *, savi
FROM sav
WHERE savi IN (SELECT *
FROM avi
WHERE sku= sav.item);
Copy the code

The above is the basic usage and function of lookup, more knowledge related to lookup can be referred to the official document lookup.

unwind

Unwinds can split a document containing an array into multiple documents in the following syntax:

{
  $unwind:
    {
      path: <field path>,
      includeArrayIndex: <string>,
      preserveNullAndEmptyArrays: <boolean>
    }
}
Copy the code

The unwinding instructions and their descriptions are as follows:

instruction	type	describe
`path`	string	Specifies the field path of an array field. This parameter is mandatory.
`includeArrayIndex`	string	The name of the new field used to hold the array index of the element.
`preserveNullAndEmptyArrays`	boolean	By default, if`path`for`null`, missing the field or an empty array, no document is output. Instead, set it to`true`It outputs the document.

Before we begin, we need to prepare the following data:

> db.shoes.save({_id: 1, brand: "Nick", sizes: [37, 38, 39]})
Copy the code

Sizes in the collection Shoes is an array of sizes. Suppose you want to split this document into three documents with a single size value, as shown in the following example:

> db.shoes.aggregate([{$unwind : "$sizes"}])
{ "_id" : 1, "brand" : "Nick"."sizes": 37} {"_id" : 1, "brand" : "Nick"."sizes": 38} {"_id" : 1, "brand" : "Nick"."sizes"39} :Copy the code

Obviously, this kind of document is more convenient for us to do data processing. PreserveNullAndEmptyArrays commands default to false, that means documents specified path is empty, null, or the lack of the path, will ignore the document. Assume the following data:

> db.shoes2.insertMany([
{"_id": 1, "item": "ABC"."sizes": ["S"."M"."L"] {},"_id": 2."item": "EFG"."sizes": []}, {"_id": 3."item": "IJK"."sizes": "M"},
{"_id": 4."item": "LMN" },
{"_id": 5, "item": "XYZ"."sizes": null}
])
Copy the code

We execute the following command:

> db.shoes2.aggregate([{$unwind: "$sizes"}])
Copy the code

You get the following output:

{ "_id" : 1, "item" : "ABC"."sizes" : "S" }
{ "_id" : 1, "item" : "ABC"."sizes" : "M" }
{ "_id" : 1, "item" : "ABC"."sizes" : "L" }
{ "_id": 3."item" : "IJK"."sizes" : "M" }
Copy the code

_id for 2, 4, and 5 documents due to meet the conditions of preserveNullAndEmptyArrays, so will not be broken up.

So that’s the basic usage of unwinding and what it does, and you can find more information about unwinding in the official unwind documentation.

out

Out is used to aggregate the result document returned by Pipeline and write it to the specified collection. Note that the OUT operation must appear at the end of the Pipeline. The syntax for out is as follows:

{ $out: "<output-collection>" }
Copy the code

Prepare the following data:

> db.books.insertMany([
{ "_id" : 8751, "title" : "The Banquet"."author" : "Dante"."copies": 2}, {"_id" : 8752, "title" : "Divine Comedy"."author" : "Dante"."copies": 1}, {"_id" : 8645, "title" : "Eclogues"."author" : "Dante"."copies": 2}, {"_id" : 7000, "title" : "The Odyssey"."author" : "Homer"."copies": 10}, {"_id" : 7020, "title" : "Iliad"."author" : "Homer"."copies": 10}])Copy the code

Suppose the grouping result for the collection books is saved in a collection named books_result as shown in the following example:

> db.books.aggregate([
... { $group : {_id: "$author", books: {$push: "$title"}}},... {$out : "books_result" }
... ])
Copy the code

After the command is executed, MongoDB creates the books_result collection and stores the grouped results into the collection. The documentation in the collection books_result is as follows:

{ "_id" : "Homer"."books" : [ "The Odyssey"."Iliad"]} {"_id" : "Dante"."books" : [ "The Banquet"."Divine Comedy"."Eclogues"]}Copy the code

The above is the basic usage and function introduction of OUT. For more knowledge related to OUT, please refer to the official document OUT.

Map-Reduce

Map-reduce is used to compress a large amount of data into useful aggregation results. The syntax is as follows:

db.runCommand(
               {
                 mapReduce: <collection>,
                 map: <function>,
                 reduce: <function>,
                 finalize: <function>,
                 out: <output>,
                 query: <document>,
                 sort: <document>,
                 limit: <number>,
                 scope: <document>,
                 jsMode: <boolean>,
                 verbose: <boolean>,
                 bypassDocumentValidation: <boolean>,
                 collation: <document>,
                 writeConcern: <document>
               }
             )
Copy the code

Db.runcommand ({mapReduce:
}) can also be written as db.collection.mapreduce (). The corresponding description of each instruction is as follows:

instruction	type	describe
`mapReduce`	collection	Set name, mandatory.
`map`	function	JavaScript function, mandatory.
`reduce`	function	JavaScript function, mandatory.
`out`	string or document	Specifies the output result, mandatory.
`query`	document	Query condition statements.
`sort`	document	Sort documents.
`limit`	number	Specify input to`map`Maximum number of documents in.
`finalize`	function	Modify the`reduce`The output.
`scope`	document	Specify global variables.
`jsMode`	boolean	Whether it is being executed`map`and`reduce`Convert intermediate data between functions to BSON format, default`false`.
`verbose`	boolean	Whether the result contains`timing`Information, default`false`.
`bypassDocumentValidation`	boolean	Whether to allow`mapReduce`Bypassing document validation during operation, default`false`.
`collation`	document	Specifies what to use for operationscollation.
`writeConcern`	document	Specifies the write level. If this parameter is not specified, the default level is used.

Simple graphs

A simple example of mapReduce syntax is as follows:

var mapFunction = function() {... }; var reduceFunction =function(key, values) { ... }; db.runCommand( ... {... . mapReduce: <input-collection>, ... . map: mapFunction, ... . reduce: reduceFunction, ... . out: { merge: <output-collection> }, ... . query: <query> ... })Copy the code

The map function is responsible for converting each input document into zero or more documents. The map structure is as follows:

function() {... emit(key, value); }Copy the code

The emit function is used for grouping and takes two arguments:

key: Specifies the field to be used for grouping.
value: Field to aggregate.

You can use the this keyword to refer to the current document in a map. The reduce structure is as follows:

function(key, values) {
   ...
   return result;
}
Copy the code

Reduce performs a specific data processing operation and receives two parameters:

keyAnd:mapIn thekeySame, that is, grouping fields.
values: will be the same according to the grouping fieldkeyPut the values ofvaluesIt’s the object that contains these classified arrays.

Out is used to specify the output of the result. Out:
will output the result to a new collection, or to an existing collection using the following syntax:

out: { <action>: <collectionName>
        [, db: <dbName>]
        [, sharded: <boolean> ]
        [, nonAtomic: <boolean> ] }
Copy the code

Note that if the collection specified by out already exists, it will overwrite that collection. Before we begin, we need to prepare the following data:

> db.mprds.insertMany([
... {_id: 1, numb: 3, score: 9, team: "B"},... {_id: 2, numb: 6, score: 9, team:"A"},... {_id: 3, numb: 24, score: 9, team:"A"},... {_id: 4, numb: 6, score: 8, team:"A"}
... ])
Copy the code

Then, map and reduce functions are defined and applied to the collection Mrexample. You then specify a location for the output, which is stored in a collection named mrexamPLE_result.

> var func_map = function(){emit(this.numb, this.score); }; > var func_reduce =function(key, values){returnArray.sum(values); }; > db.mprds.mapReduce(func_map, func_reduce, {query: {team:"A"}, out: "mprds_result"})
Copy the code

The map function specifies the two keys included in the result and outputs the same document as this.class to the same document. Reduce sums the incoming list and uses the sum result as the value in the result. After the command is executed, the result is stored in the collection mPRDS_result. Use the following command to view the results:

> db.mprds_result.find()
{ "_id" : 6, "value": 17} {"_id" : 24, "value"9} :Copy the code

The _id in the result document is this.numb in map, and value is the return value of the reduce function.

The following diagram depicts the complete process of the mapReduce operation:

Finallize pruning

Finallize is used to modify the reduce output. The syntax is as follows:

function(key, reducedValue) {
   ...
   return modifiedObject;
}
Copy the code

It takes two arguments:

Key is the same as key in map, that is, group field.

ReducedValue, an Obecjt, is the output of Reduce.

We have introduced Map and Reduce and used a simple example to understand the basic components and usage of mapReduce. In fact, we can write more features-rich Reduce functions and even modify the output of Reduce using Finallize. The following reduce function evaluates and reorganizes the values passed in and returns a reduceVal object:

> var func_reduce2 = function(key, values){
	reduceVal = {team: key, score: values, total: Array.sum(values), count: values.length};
	return reduceVal;
};
Copy the code

The reduceVal object contains four attributes: Team, Score, Total, and Count. We also want to add avG attributes, so we can calculate avG values and add AVG attributes in finallize:

> var func_finalize = function(key, values){
	values.avg = values.total / values.count;
	return values;
};
Copy the code

Map remains unchanged and these functions are applied to the set MPRDS as shown in the following example:

> db.mprds.mapReduce(func_map, func_reduce2, {query: {team: "A"}, out: "mprds_result", finalize: func_finalize})
Copy the code

After the command is executed, the result is saved to the specified collection. In this case, the collection mPRds_result contains the following contents:

{ "_id" : 6, "value" : { "team" : 6, "score": [9, 8],"total": 17."count": 2."avg": 8.5}} {"_id" : 24, "value"9} :Copy the code

The following diagram depicts the complete process of the mapReduce operation:

finallize
reduce
reduce
finallize

Note that map generalizes the values in documents with the same key value into the same object, which is reduced and finallize. For documents with a unique key value, the specified key and value are printed directly.

Simple polymerization

In addition to complex Aggregation operations such as Aggregation Pipeline and Map-reduce, MongoDB also supports simple Aggregation operations such as count, Group, and DISTINCT.

count

Count is used to count the number of documents in a collection or view and returns a document containing the count result and status. The syntax is as follows:

{
  count: <collection or view>,
  query: <document>,
  limit: <integer>,
  skip: <integer>,
  hint: <hint>,
  readConcern: <document>
}
Copy the code

The instructions supported by count and their corresponding descriptions are as follows:

instruction	type	describe
`count`	string	Specifies the name of the collection or view to count.
`query`	document	Query condition statements.
`limit`	integer	Specifies the maximum number of matching documents to return.
`skip`	integer	Specifies the number of matching documents to skip before returning the result.
`hint`	string or document	Specify the index to use, specifying the index name as a string or index specification document.

Suppose you want to count the number of documents in the collection MPRDS as follows:

> db.runCommand({count: 'mprds'{})"n": 4."ok": 1}Copy the code

For example, numb is set to 6 in MPRDS.

> db.runCommand({count: 'mprds', query: {numb: {$eq: {6}}})"n": 2."ok": 1}Copy the code

Skip 1 document before specifying return result as shown in the following example:

> db.runCommand({count: 'mprds', query: {numb: {$eq: 6}}, skip: 1})
{ "n" : 1, "ok": 1}Copy the code

More information about count can be found in the official document count.

group

A group is used to group documents in a collection by a specified key and perform a simple aggregate function. GROUP BY The syntax is as follows:

{
  group:
   {
     ns: <namespace>,
     key: <key>,
     $reduce: <reduce function>,
     $keyf: <key function>,
     cond: <query>,
     finalize: <finalize function>}}Copy the code

The commands supported by the group and their corresponding descriptions are as follows:

instruction	type	describe
`ns`	string	Set of execution groups by operation. This parameter is mandatory.
`key`	ducoment	Field or field to group, mandatory.
`$reduce`	function	A function that aggregates documents during a grouping operation. This function takes two parameters: the current document and the aggregate result document for the group. Mandatory.
`initial`	document	Initialize aggregation result document, mandatory.
`$keyf`	function	alternative`key`. Specifies the function used to create a “key object” to be used as a grouping key. use`$keyf`Rather than`key`Group by computed fields rather than existing document fields.
`cond`	document	Selection criteria for determining which documents in the collection to process. If I omit,`group`All documents in the collection are processed.
`finalize`	function	Run before the result is returned, this function can modify the result document.

Prepare the following data:

> db.sales.insertMany([
{_id: 1, orderDate: ISODate("2012-07-01T04:00:00Z"), shipDate: ISODate("2012-07-02T09:00:00Z"), attr: {name: "New coconut shoes.", price: 2999, size: 42, color: "Champagne gold"}},
{_id: 2, orderDate: ISODate("2012-07-03T05:20:00Z"), shipDate: ISODate("2012-07-04T09:00:00Z"), attr: {name: "Gobond Basketball shoes", price: 1999, size: 43, color: "Lion Brown"}},
{_id: 3, orderDate: ISODate("2012-07-03T05:20:10Z"), shipDate: ISODate("2012-07-04T09:00:00Z"), attr: {name: "New coconut shoes.", price: 2999, size: 42, color: "Champagne gold"}},
{_id: 4, orderDate: ISODate("2012-07-05T15:11:33Z"), shipDate: ISODate("2012-07-06T09:00:00Z"), attr: {name: "Speed shoes.", price: 500, size: 43, color: "West Lake Blue"}},
{_id: 5, orderDate: ISODate("2012-07-05T20:22:09Z"), shipDate: ISODate("2012-07-06T09:00:00Z"), attr: {name: "New coconut shoes.", price: 2999, size: 42, color: "Champagne gold"}},
{_id: 6, orderDate: ISODate("2012-07-05T22:35:20Z"), shipDate: ISODate("2012-07-06T09:00:00Z"), attr: {name: "Breathable net run.", price: 399, size: 38, color: "Rose red"}}])Copy the code

Suppose you want to group the documents in the sales collection by attr.name and limit the shipDate of the participating documents to a time greater than specified. The following is an example:

> db.runCommand({
    group:{
    	ns: 'sales',
      key: {"attr.name": 1},
      cond: {shipDate: {$gt: ISODate('2012-07-04T00:00:00Z')}},
      $reduce: function(curr, result){},
      initial: {}
    }
})
Copy the code

After the command is executed, a result file is returned. Where retval contains data for the specified field atr.name, count for the number of documents participating in the group, keys for the number of groups, and OK for the document status. The resulting document is as follows:

{
	"retval": [{"attr.name" : "Gobond Basketball shoes"
		},
		{
			"attr.name" : "New coconut shoes."
		},
		{
			"attr.name" : "Speed shoes."
		},
		{
			"attr.name" : "Breathable net run."}]."count" : NumberLong(5),
	"keys" : NumberLong(4),
	"ok": 1}Copy the code

The key specified in the above example is atr.name. Since only two of the five documents participating in the grouping have the same atr.name, the keys in the grouping result are 4, which means that the documents in the collection Sales are divided into four groups.

Replace attr. Name with shipDate and see what happens. The following is an example:

> db.runCommand(
{
    group:{
        ns: 'sales',
        key: {shipDate: 1},
        cond: {shipDate: {$gt: ISODate('2012-07-04T00:00:00Z')}},
        $reduce: function(curr, result){},
        initial: {}
        }
	}
)
Copy the code

After the command is executed, the following information is displayed:

{
	"retval": [{"shipDate" : ISODate("2012-07-04T09:00:00Z")}, {"shipDate" : ISODate("2012-07-06T09:00:00Z")},"count" : NumberLong(5),
	"keys" : NumberLong(2),
	"ok": 1}Copy the code

Since several of the five documents participating in the grouping have the same shipDate, the keys in the grouping result is 2, which means that the documents in the collection SALES are divided into two groups.

The example above does not use Reduce, Initial, and Finallize, and we will demonstrate their use and usefulness. Given that you want to count total sales for a group, you can perform specific calculation logic in Reduce. The following is an example:

> db.runCommand(
{
    group:{
        ns: 'sales',
        key: {shipDate: 1},
        cond: {shipDate: {$gt: ISODate('2012-07-04T00:00:00Z')}},
        $reduce: function(curr, result){
        	result.total += curr.attr.price;
        	},
        initial: {total: 0}
        }
	}
)
Copy the code

After the command is executed, the following information is displayed:

{
	"retval": [{"shipDate" : ISODate("2012-07-04T09:00:00Z"),
			"total": 4998}, {"shipDate" : ISODate("2012-07-06T09:00:00Z"),
			"total": 3898}]."count" : NumberLong(5),
	"keys" : NumberLong(2),
	"ok": 1}Copy the code

Manually verify that the document with shipDate greater than 2012-07-04T09:00:00Z is:

{ "_id": 2."orderDate" : ISODate("2012-07-03T05:20:00Z"), "shipDate" : ISODate("2012-07-04T09:00:00Z"), "attr" : { "name" : "Gobond Basketball shoes"."price" : 1999, "size": 43."color" : "Lion Brown"}} {"_id": 3."orderDate" : ISODate("2012-07-03T05:20:10Z"), "shipDate" : ISODate("2012-07-04T09:00:00Z"), "attr" : { "name" : "New coconut shoes."."price" : 2999, "size": 42."color" : "Champagne gold"}}Copy the code

The total sales is 1999 + 2999 = 4998, the same as the returned result. ShipDate > 2012-07-06T09:00:00Z

{ "_id": 4."orderDate" : ISODate("2012-07-05T15:11:33Z"), "shipDate" : ISODate("2012-07-06T09:00:00Z"), "attr" : { "name" : "Speed shoes."."price" : 500, "size": 43."color" : "West Lake Blue"}} {"_id" : 5, "orderDate" : ISODate("2012-07-05T20:22:09Z"), "shipDate" : ISODate("2012-07-06T09:00:00Z"), "attr" : { "name" : "New coconut shoes."."price" : 2999, "size": 42."color" : "Champagne gold"}} {"_id" : 6, "orderDate" : ISODate("2012-07-05T22:35:20Z"), "shipDate" : ISODate("2012-07-06T09:00:00Z"), "attr" : { "name" : "Breathable net run."."price" : 399, "size": 38."color" : "Rose red"}}Copy the code

The total sales amount is 500 + 2999 + 399 = 3898, the same as the returned result.

Sometimes it is necessary to count the number of documents per group and calculate the average sales, as shown in the following example:

> db.runCommand(
{
    group:{
        ns: 'sales',
        key: {shipDate: 1},
        cond: {shipDate: {$gt: ISODate('2012-07-04T00:00:00Z')}},
        $reduce: function(curr, result){
        	result.total += curr.attr.price;
        	result.count ++;
        	},
        initial: {total: 0, count: 0},
        finalize: function(result){ result.avg = Math.round(result.total / result.count); }}})Copy the code

In the example above, the $reduce function was changed to count. Finalize was then added to calculate the average sales in the grouping. After the command is executed, the following documents are returned:

{
	"retval": [{"shipDate" : ISODate("2012-07-04T09:00:00Z"),
			"total" : 4998,
			"count": 2."avg": 2499}, {"shipDate" : ISODate("2012-07-06T09:00:00Z"),
			"total" : 3898,
			"count": 3."avg": 1299}]."count" : NumberLong(5),
	"keys" : NumberLong(2),
	"ok": 1}Copy the code

The above is the basic usage and function introduction of group. For more knowledge related to group, please refer to the official document group.

distinct

The distinct function is to find different values of a specified field in a single collection. The syntax is as follows:

{
  distinct: "<collection>",
  key: "<field>",
  query: <query>,
  readConcern: <read concern document>,
  collation: <collation document>
}
Copy the code

The distinct commands and their corresponding descriptions are as follows:

instruction	type	describe
`distinct`	string	Set name, mandatory.
`key`	string	Specifies the field. This field is mandatory.
`query`	document	Query condition statements.
`readConcern`	document
`collation`	document

Prepare the following data:

> db.dress.insertMany([
... {_id: 1, "dept": "A", attr: {"Style": "Collar", color: "red" }, sizes: ["S"."M"]},... {_id: 2,"dept": "A", attr: {"Style": "Collar", color: "blue" }, sizes: ["M"."L"]},... {_id: 3,"dept": "B", attr: {"Style": "Collar", color: "blue" }, sizes: "S"},... {_id: 4,"dept": "A", attr: {"Style": "V is gotten", color: "black" }, sizes: ["S"]}])Copy the code

Suppose you want to count the different values of the DEPT field for all documents in the dress collection as follows:

> db.runCommand ( { distinct: "dress", key: "dept"{})"values" : [ "A"."B"]."ok": 1}Copy the code

Or see what styles are available, as shown below

> db.runCommand ( { distinct: "dress", key: "Attr. Style"{})"values" : [ "Collar"."Collar"."V is gotten"]."ok": 1}Copy the code

Distinct can be handled correctly even if the value is an array as shown in the following example:

> db.runCommand ( { distinct: "dress", key: "sizes"{})"values" : [ "M"."S"."L"]."ok": 1}Copy the code

Summary of flow polymerization operations

This is the introduction to streaming aggregation in MongoDB. The concepts of aggregation and pipes are not common, but they are not difficult to understand. Just think through the examples and get your hands wet, and you’ll soon be able to master aggregation.

See enough?

In addition to helping you master streaming aggregation, I’ve written a quick start tutorial on MongoDB that will help you:

CRUD operations and Cursor objects for the document
Master streaming aggregation operations, easily facing any data processing requirements
Understand MongoDB query efficiency and optimization
Learn how to improve MongoDB usability
Learn how to deal with data service failures
Understand access control for MongoDB
Learn to use data models to reduce data redundancy and improve efficiency
Master mongodump data backup and restoration methods

The MongoDB series is for the crowd

0 base developers interested in MongoDB
Developers who have a foundation and want to know everything about MongoDB

Why this MongoDB tutorial?

Similar MongoDB tutorials can cost hundreds of dollars
MongoDB’s official documentation is obscure
Other online articles are not comprehensive enough to form systematic knowledge
The sample code in Chat can be copied directly for easy practice
MongoDB has many contents, and self-learners are not sure what knowledge they need to learn

This is a quick introduction to MongoDB written for those who are basic to 0. Content from document CRUD to streaming aggregation operations; From execution plans, indexes, data models to replication sets; From sharding, access control to data backup and restoration. The content of nearly 50,000 words covers most knowledge points of MongoDB, fully meeting the requirements of daily development.

For the record, this tutorial costs 9.9 yuan.

You can add the author’s wechat account: DomFreez and make fun of the author.

More than 600 friends have participated so far

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Easy to master MongDB streaming aggregation

Aggregation Pipeline

The concept of shallow

Common Stage

Map-Reduce

Simple graphs

Finallize pruning

Simple polymerization

Summary of flow polymerization operations

See enough?

The MongoDB series is for the crowd

Why this MongoDB tutorial?

Easy to master MongDB streaming aggregation

Aggregation Pipeline

The concept of shallow

Common Stage

Map-Reduce

Simple graphs

Finallize pruning

Simple polymerization

Summary of flow polymerization operations

See enough?

The MongoDB series is for the crowd

Why this MongoDB tutorial?

Related Posts

Explore the differences, connections, advantages and disadvantages between recursion and iteration, and compare examples

JAVA enum class

The Mid-Autumn festival? You have not seen the moon cake Banner! 【Spring Boot Quick Start 】 twelve, Spring Boot Banner Settings