In the actual application of database, we often need to use aggregation operations to help us deal with data, statistics and collation of data

In this article we will learn how to use aggregate operations in MongoDB

1. Aggregation function and aggregation pipeline

The basic syntax for using aggregate functions and aggregate pipes is as follows:

db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
Copy the code

Common aggregation functions are as follows:

Aggregate functions are used to process data, such as summing, averaging, etc., and return the final calculation

The operator describe
$sum sum
$avg averaging
$min For the minimum
$max For maximum
$first Get the first document
$last Get the last document
$push Inserts a value into the array

Common polymerization channels are as follows:

An aggregation pipe can pass the results of a previous pipe to the next pipe for further processing

The operator describe
$group Use to group documents
$project Used to modify the structure of a document by renaming, adding, or deleting fields
$match Filter documents that do not meet the criteria
$sort Sort the document and output it
$limit Specifies the number of records to read
$skip Specifies the number of records to skip

Ok, let’s do the experiment ourselves, first prepare the test data

> use university
> db.teacher.insert([
    {
        'tid': '19001'.'name': 'Alice'.'age': 32.'department': 'Computer'.'salary': 10000
    },
    {
        'tid': '19002'.'name': 'Bob'.'age': 48.'department': 'Computer'.'salary': 15000
    },
    {
        'tid': '19003'.'name': 'Alice'.'age': 42.'department': 'Software'.'salary': 12000
    },
    {
        'tid': '19004'.'name': 'Christy'.'age': 38.'department': 'Software'.'salary': 14000
    },
    {
        'tid': '19005'.'name': 'Daniel'.'age': 28.'department': 'Architecture'.'salary': 8000}])Copy the code

Collect the total salary of all the teachers

db.teacher.aggregate([
    {
        $group: {
            _id: null.// No grouping
            total_salary: { $sum: '$salary' } // Add the value of the salary field}}, {$project: {
            _id: 0.// Do not output the _id field
            total_salary: 1 // Output the total_salary field}}])// Query result
// { "total_salary" : 59000 }
Copy the code

Count the total number of teachers whose salaries exceed 10,000

db.teacher.aggregate([
    {
        $match: {
            salary: { $gt: 10000 } // Return documents whose salary field is greater than 10000}}, {$group: {
            _id: null.// No grouping
            total_teacher: { $sum: 1 } // Sum the value 1}}, {$project: {
            _id: 0.// Do not output the _id field
            total_teacher: 1 // Output the total_teacher field}}])// Query result
// { "total_teacher" : 3 }
Copy the code

Collect the average salary of teachers in each college and output it in order of average salary from small to large

db.teacher.aggregate([
    {
        $group: {
            _id: '$department'.// Group with the value of the department field
            avg_salary: { $avg: '$salary' } // Average the value of the salary field}}, {$project: {
            _id: 0.// Do not output the _id field
            dept_name: '$_id'.// Add depT_name to the _id field
            avg_salary: 1 // Output the avg_SALARY field}}, {$sort: {
            avg_salary: 1 // The avG_SALARY field is sorted in ascending order}}])// Query result
// { "avg_salary" : 8000, "dept_name" : "Architecture" }
// { "avg_salary" : 12500, "dept_name" : "Computer" }
// { "avg_salary" : 13000, "dept_name" : "Software" }
Copy the code

Output the numbers of the top three paid teachers

db.teacher.aggregate([
    {
        $sort: {
            salary: -1 // The salary field is sorted in descending order}}, {$limit: 3 // Only 3 documents can be read
    },
    {
        $project: {
            _id: 0.// Do not output the _id field
            tid: 1 // Outputs the tid field}}])// Query result
// { "tid" : "19002" }
// { "tid" : "19004" }
// { "tid" : "19003" }
Copy the code

2, Map Reduce

In addition to aggregation functions and aggregation pipes, MongoDB also has a more flexible aggregation operation called Map Reduce

Map Reduce is a computing model that decomposes large pieces of work (Map) and then merges the results (Reduce) into final results

The basic syntax is as follows:

db.COLLECTION_NAME.aggregate(
    function() { emit(key, value) }, // Map function, which generates a sequence of key-value pairs as arguments to reduce function
    function(key, values) { return reduceFunction }, // reduce function to process values
    {
        query: <query>,   // Specify filter criteria so that only documents that meet the criteria will call the map function
        sort: <function>, // in the callmapSort documents before the functionlimit: <number> < span style = "max-width: 100%; clear: both; min-height: 1emmapThe number of documents for the functionfinalize: <function>, // Modify the data before storing the result setout: <collection>, // specify where to store results, if not, use temporary set})Copy the code

Let’s take an example

Statistics on the average salary of faculty over 30 years of age over 10000 colleges, but does not output information about salary

db.teacher.mapReduce(
    // 2, execute map function, the core of map function is to call emit function, provide reduce function parameters
    // The first argument to the emit function specifies the fields to be grouped and the second argument specifies the fields to be counted
    // Select key from department; Combine the value of the salary field into an array as values
    // Pass each grouping (key, values) as an argument to the reduce function
    function() { emit(this.department, this.salary) },
    // Execute reduce function. The core of reduce function is to change (key, values) to (key, value).
    // The function takes arguments (key,values) from the map function and returns a processed value as value
    // Value and key are combined into (key, value) and passed backwards
    // This returns an average value of values evaluated using avg
    function(key, values) { return Array.avg(values) },
    {
        // 1. Execute query first, filter out the documents that do not meet the criteria, and send the documents that meet the criteria to map function
        query: { age: { $gt: 30}},// 4. Execute finalize function to finalize the result before storing it in out set
        // This function takes arguments (key, value) from reduce and returns a processed value as value
        // Hide the average salary information by setting the value field to null
        finalize: function(key, value) {
            return null
        },
        // Add the final result to total_teacher
        out: 'total_teacher'})Copy the code

You can see the output below

{
	"result" : "total_teacher".// The name of the collection to store the results
	"timeMillis" : 276.// Time spent in milliseconds
	"counts" : {
		"input" : 4.// The number of documents sent to the map function after filtering
		"emit" : 4.// The number of documents processed in the map function
		"reduce" : 2.// Number of documents processed in reduce function
		"output" : 2 // The number of documents in the result set
	},
	"ok" : 1
}
Copy the code

And look at the results

> show collections
// teacher
// total_teacher
> db.total_teacher.find()
// { "_id" : "Computer", "value" : null }
// { "_id" : "Software", "value" : null }
Copy the code