MapReduce can compute very complex aggregation logic and is very flexible. However, MapReduce is very slow and should not be used for real-time data analysis. MapReduce can be executed in parallel on multiple servers, each Server is only responsible for completing part of the WordLoad, and finally the WordLoad is sent to the Master Server to be merged, the final result set is calculated and returned to the client. The basic idea of MapReduce is shown in the following figure:

In this case, let’s take a sum. The Map phase, which breaks a large task into several smaller tasks that run on different nodes to support distributed computing, is called Map (shown in the blue box). The output result of each small task is then calculated twice, and the final result is 55. This stage is called Reduce (as shown in the red box).

Using MapReduce to compute aggregation, there are three main steps: Map, Shuffle, and Reduce. Map and Reduce need to be explicitly defined. Shuffle is implemented by MongoDB.

  • Map: Maps the operation to each doc, producing keys and values
  • Shuffle: Group by Key and group the same values into an array
  • Reduce: To Reduce a group of values to a single Value

We take the following test data (employee data) as an example to demonstrate for you.

{_id:7369,ename:'SMITH' ,job:'CLERK'    ,mgr:7902,hiredate:'17-12-80',sal:800,comm:0,deptno:20},
{_id:7499,ename:'ALLEN' ,job:'SALESMAN' ,mgr:7698,hiredate:'20-02-81',sal:1600,comm:300 ,deptno:30},
{_id:7521,ename:'WARD'  ,job:'SALESMAN' ,mgr:7698,hiredate:'22-02-81',sal:1250,comm:500 ,deptno:30},
{_id:7566,ename:'JONES' ,job:'MANAGER'  ,mgr:7839,hiredate:'02-04-81',sal:2975,comm:0,deptno:20},
{_id:7654,ename:'MARTIN',job:'SALESMAN' ,mgr:7698,hiredate:'28-09-81',sal:1250,comm:1400,deptno:30},
{_id:7698,ename:'BLAKE' ,job:'MANAGER'  ,mgr:7839,hiredate:'01-05-81',sal:2850,comm:0,deptno:30},
{_id:7782,ename:'CLARK' ,job:'MANAGER'  ,mgr:7839,hiredate:'09-06-81',sal:2450,comm:0,deptno:10},
{_id:7788,ename:'SCOTT' ,job:'ANALYST'  ,mgr:7566,hiredate:'19-04-87',sal:3000,comm:0,deptno:20},
{_id:7839,ename:'KING'  ,job:'PRESIDENT',mgr:0,hiredate:'17-11-81',sal:5000,comm:0,deptno:10},
{_id:7844,ename:'TURNER',job:'SALESMAN' ,mgr:7698,hiredate:'08-09-81',sal:1500,comm:0,deptno:30},
{_id:7876,ename:'ADAMS' ,job:'CLERK'    ,mgr:7788,hiredate:'23-05-87',sal:1100,comm:0,deptno:20},
{_id:7900,ename:'JAMES' ,job:'CLERK'    ,mgr:7698,hiredate:'03-12-81',sal:950,comm:0,deptno:30},
{_id:7902,ename:'FORD'  ,job:'ANALYST'  ,mgr:7566,hiredate:'03-12-81',sal:3000,comm:0,deptno:20},
{_id:7934,ename:'MILLER',job:'CLERK'    ,mgr:7782,hiredate:'23-01-82',sal:1300,comm:0,deptno:10}

(Example 1) Find the number of employees in each position

var map1=function(){emit(this.job,1)}
var reduce1=function(job,count){return Array.sum(count)}

(Case 2) Find the total salary of each department in the employee table

var map2=function(){emit(this.deptno,this.sal)}
var reduce2=function(deptno,sal){return Array.sum(sal)}

Example 3: Troubleshoot the Map Function

Var emit = function(key, value) {print("emit"); print("key: " + key + " value: " + tojson(value)); Emp7839 = db.emp.findone ({_id:7839}) map2.apply(emp7839) emit key: 10 value: Var myCursor=db.emp.find() while (mycursor.hasNext ()) {var doc = (); print ("document _id= " + tojson(doc._id)); map2.apply(doc); print(); }

Example 4: Troubleshoot the Reduce Function

Var myTestValues = [5, 5, 10]; var myTestValues = [5, 5, 10]; Var map1 =function(key,values){return Array. SUM (values)} Var myTestObjects = [{sal: 1000, comm: 5}, {sal: 2000, comm: 10}, {sal: 3000, comm: 15}]; var myTestObjects = [{sal: 1000, comm: 5}, {sal: 2000, comm: 10}, {sal: 3000, comm: 15}]; Var reduce2=function(key,values) {reducedValue = {sal: 0, comm: 0}; for(var i=0; i<values.length; i++) { reducedValue.sal += values[i].sal; reducedValue.comm += values[i].comm; } return reducedValue; } test: reduce2("aa",myTestObjects)