Cluster module overview

Node instances are single-threaded. In server-side programming, it is common to create multiple Node instances to handle client requests in order to increase system throughput. For such multiple node instances, we call them clusters.

With the cluster module of Node, developers can get the benefits of cluster services with little modification of the original project code.

Cluster has the following two common implementation schemes. The Cluster module of Node adopts scheme 2.

Solution 1: Multiple Node instances + multiple ports

Each node instance in the cluster listens to a different port, and then the reverse proxy realizes the distribution of requests to multiple ports.

Advantages: Simple implementation, each instance is relatively independent, which is good for service stability. Disadvantages: Increased port occupation, communication between processes is more troublesome.

Scheme 2: The master process forwards requests to its children

Within the cluster, create a master process and several child processes (workers). The master listens to the client connection request and forwards it to the worker according to the specific policy.

Advantages: Generally, only one port is occupied. Communication is simple and forwarding policies are flexible. Disadvantages: relatively complex implementation, the main process of high stability requirements.

An introduction to instance

In the Cluster module, the main process is called master and the child process is called worker.

As an example, create server instances with the same number of cpus to handle client requests. Notice that they all listen on the same port.

// server.js
var cluster = require('cluster');
var cpuNums = require('os').cpus().length;
var http = require('http');

if(cluster.isMaster){
  for(var i = 0; i < cpuNums; i++){
    cluster.fork();
  }
}else{
  http.createServer(function(req, res){
    res.end(`response from worker ${process.pid}`);
  }).listen(3000);

  console.log(`Worker ${process.pid} started`);
}

Copy the code

To create a batch script:./req.sh:

#! /bin/bash # req.sh for((i=1; i<=4; i++)); Do curl http://127.0.0.1:3000 echo "" doneCopy the code

The output is as follows. As you can see, the responses come from different processes.

response from worker 23735
response from worker 23731
response from worker 23729
response from worker 23730
Copy the code

Implementation principles of the Cluster module

To understand the Cluster module, we should mainly figure out three problems:

  1. How do master and worker communicate?
  2. Multiple Server instances, how to achieve port sharing?
  3. How can requests from clients be distributed to multiple workers in multiple server instances?

Question 1: How do master and worker communicate

This is an easy one. The master process creates the worker process through cluster.fork(). Cluster.fork () creates child processes internally by child_process.fork().

In other words:

The master process and worker process are the parent and child processes. The master process and woker process can communicate with each other through the IPC channel. (important)

Question 2: How to implement port sharing

In the previous example, servers created in multiple Wokers listen on the same port 3000. Generally speaking, if multiple processes listen on the same port, the system will report an error.

Why is our example ok?

The secret is that the LISTEN () method is treated in a special way in the NET module. Depending on whether the current process is the master process or the worker process:

Master: listens for requests normally on this port. Worker process: Creates server instance. A message is then sent to the master process over the IPC channel to also create a server instance and listen for requests on that port. When a request comes in, the master process forwards the request to the server instance of the worker process.

To sum up, the master process listens on a specific port and forwards customer requests to the worker process.

As shown below:

Question 3: How do I distribute requests to multiple workers

Whenever the worker process creates a server instance to listen for requests, it registers on the master through the IPC channel. When the client request arrives, the master will be responsible for forwarding the request to the corresponding worker.

Which worker will it be forwarded to? This is determined by the forwarding policy. It can be set with the environment variable NODE_CLUSTER_SCHED_POLICY or passed in at cluster.setupmaster (options).

The default forwarding policy is polling (SCHED_RR).

When a customer request arrives, the master will poll the worker list, find the first idle worker, and forward the request to the worker.

Master, worker internal communication tips

During development, we use process.on(‘message’, fn) for interprocess communication.

As mentioned above, master process and worker process also communicate through IPC channel during the creation of server instance. Will that interfere with our development? Like getting a bunch of messages you don’t really need to care about?

Surely the answer is no? So how does it work?

When the sent message contains a CMD field prefixed with NODE_, the message is treated as an internal reserved message and is not thrown by the message event, but can be caught by listening on ‘internalMessage’.

For example, the worker process notifies the master process to create a server instance. The worker pseudocode is as follows:

// woker const message = {CMD: 'NODE_CLUSTER', act: 'queryServer'}; process.send(message);Copy the code

The master pseudocode is as follows:

worker.process.on('internalMessage', fn);
Copy the code