Node.js child processes: Everything you need to know

Node.js Child Processes: Everything you need to know

Author: This article is authorized by Samer Buna

The Nuggets translation Project

Translator: Xiong Xianren

Proofread by: CACppuccino, wilsonandusa

How to use spawn(), exec(), execFile() and fork()

Screenshots from my video teaching course – Node.js Advanced

The single-threaded, non-blocking execution feature of Node.js works well with a single process. However, a single process in a single CPU is ultimately insufficient to handle the increased workload in an application.

No matter how powerful your server is, a single thread can only support a limited amount of load.

Just because Node.js runs on a single thread doesn’t mean we can’t take advantage of multiple processes, of course, it can also run on multiple machines.

Using multiple processes is the best way to scale Node applications. Node.js is a natural fit for building distributed applications on multiple nodes. That’s why it’s named “Node.” Extensibility has been built into the platform since the beginning of apps.

This post is a complement to my Node.js video tutorial. We talked about similar things in the course videos.

Please note that you need to have a good understanding of Node.js events and flows before reading this article. If not, I recommend you read the following two articles:

Subprocess module

We can use Node’s child_process module to simply create child processes that communicate with each other via the messaging system.

The child_process module gives us the ability to use operating system functions by executing system commands in a child process.

We can control the input stream of the child process and listen to its output stream. We can also modify the parameters passed to the underlying OS commands and get whatever command output we want. For example, we can treat the output of one command as the input of another command (as in Linux), because all the input and output of these commands can be represented using node.js streams.

Note: All the examples in this article are based on Linux. On Windows, you switch to their corresponding Window commands.

There are four different ways to create child processes in Node.js: spawn(), fork(), exec(), and execFile().

We’ll learn the differences between these four functions and their usage scenarios.

A derived child process

The spawn function starts a command in a new process and can be used to pass arbitrary arguments to the command. For example, the following code spawns a new process that executes the PWD command.

const { spawn } = require('child_process');

const child = spawn('pwd');
Copy the code

We simply deconstruct the spawn function from the child_process module and execute it with the system command as the first argument.

The result of the spawn function (the child object above) is an instance of ChildProcess that implements the EventEmitter API. This means that we can register event handlers directly on this child object. For example, when registering an EXIT event handler on a child process, we can perform some tasks in the event handler function:

child.on('exit', function (code, signal) {
  console.log('child process exited with ' +
              `code ${code} and signal ${signal}`);
});
Copy the code

The handler above gives the child’s exit code and signal, two variables that can be used to terminate the child. The signal variable is null when the child process exits normally.

Disconnect, Error, close, and Message events can also be registered on ChildProcess instances.

disconnectThe event is manually called in the parent processchild.disconnectTriggered when the function.
Trigger if the process cannot be spawned or killederrorEvents.
closeEvent in the child processstdioTriggered when the stream is closed.
messageEvents matter most. It is used in the child processprocess.send()Function to pass a message. This is how parent/child communication works. An example is given below.

Each child process also has three standard STdio streams, which we can use with Child.stdin, Child.stdout, and Child.stderr respectively.

When these streams are closed, child processes that use them trigger close events. The close event here is different from the exit event because multiple child processes may share the same STDIO stream, so the exit of one child does not mean that the stream has been closed.

Since all streams are event triggers, we can listen for different events on the STDIO stream belonging to each child process. Unlike normal processes, stdout/ Stderr streams are readable and stdin streams are writable in child processes. This is basically the opposite of the main process. The events supported by these flows are standard. Most importantly, on a readable stream we can listen for data events to get the output of any command or any errors that occur during its execution:

child.stdout.on('data', (data) => {
  console.log(`child stdout:\n${data}`);
});

child.stderr.on('data', (data) => {
  console.error(`child stderr:\n${data}`);
});
Copy the code

The above two handlers will output both logs to the main process’s STdout and stderr events. When we execute the previous spawn function, the output of the PWD command is printed and the child exits with code 0, indicating that no errors have occurred.

We can pass arguments to the command, which is executed by the spawn function. The spawn function takes the second argument, which is an array of all the arguments passed to the command. For example, to execute find in the current directory with a type f argument that lists all files, we could do this:

const child = spawn('find', ['.', '-type', 'f']);
Copy the code

If there is an error in the execution of this command, for example, if we find an illegal object file, the Child.stderr data event handler will be triggered, and the exit event handler will issue an exit code 1, which indicates that an error has occurred. The error value ultimately depends on the host operating system and the error type.

Stdin in the child process is a writable stream. We can use it to send some input to the command. As with all writable streams, the easiest way to consume input is to use the PIPE function. We can simply pipe a readable stream into a writable stream. Since the stdin of the main thread is a readable stream, we can pipe it into the stdin stream of the child process. Here’s an example:

const { spawn } = require('child_process');

const child = spawn('wc');

process.stdin.pipe(child.stdin)

child.stdout.on('data', (data) => {
  console.log(`child stdout:\n${data}`);
});
Copy the code

In this example, the child process invokes the WC command, which counts the number of lines, words, and characters in Linux. We then pipe the main process’s STDIN to the child process’s STDIN (a writable stream). The result of this combination is that we get a standard input mode in which we can type some characters. When Ctrl+D is pressed, the input will be used as input to the WC command.

Gif screenshot from my video teaching course – Node.js advanced

We can also pipe the standard input/output of multiple processes to each other, as with Linux commands. For example, we could pipe stdout of find to stdin of wc to count all the files in the current directory.

const { spawn } = require('child_process');

const find = spawn('find', ['.', '-type', 'f']);
const wc = spawn('wc', ['-l']);

find.stdout.pipe(wc.stdin);

wc.stdout.on('data', (data) => {
  console.log(`Number of files ${data}`);
});
Copy the code

I added the -l argument to the wc command so that it only counts the number of lines. When finished, the code prints the number of lines for all subdirectory files in the current directory.

Shell syntax and exec functions

By default, the spawn function does not create a shell to execute for the commands we pass in, making it slightly more efficient than the exec function that creates the shell. The exec function has another major difference in that it buffers the output generated by the command and passes the entire output value to a callback function (rather than using a stream, which is spawn’s practice).

Here is given before the find | wc example of the exec function implementation.

const { exec } = require('child_process');

exec('find . -type f | wc -l', (err, stdout, stderr) => {
  if (err) {
    console.error(`exec error: ${err}`);
    return;
  }

  console.log(`Number of files ${stdout}`);
});
Copy the code

Since the exec function uses the shell to execute commands, we can use shell syntax to take advantage of the shell pipeline feature directly.

When the stdout argument is present, the exec function buffers the output and passes it to the callback function (exec’s second argument). The stdout argument here is the output of the command, which we will print out.

If you need to use shell syntax and the data from commands is small, exec functions are a good choice. (Remember, exec buffers all data into memory before returning.)

When the expected data size of the command is large, it is much better to select the spawn function because the data will be streamed along with standard IO objects.

We can make the spawn process inherit its parent’s standard IO objects, but more importantly, we can also make the spawn function use shell syntax. Here is also the find | wc command by the spawn function:

const child = spawn('find . -type f', {
  stdio: 'inherit',
  shell: true
});
Copy the code

Because of the stdio: ‘inherit’ option above, when the code executes, the child inherits stdin, stdout, and stderr from the main process. This causes the child’s data event handler to be fired on the main process’s process.stdout stream, causing the script to output results immediately.

The shell: True option allows us to use shell syntax in the passed commands, as we did in the previous exec example. But this code can also take advantage of streaming data from the spawn function. Truly achieve a win-win situation.

In addition to shell and stdio, there are other options available for the last argument to the child_process function. For example, use the CWD option to change the working directory of the script. As an example, here’s the same example of counting all files as before, implemented using the spawn function, using a shell command, and setting the working directory as my Downloads folder. The CWD option here tells the script to count all the files in your ~/Downloads directory.

const child = spawn('find . -type f | wc -l', {
  stdio: 'inherit',
  shell: true,
  cwd: '/Users/samer/Downloads'
});
Copy the code

Another option you can use is env, which specifies which environment variables are visible to the child process. The default value for this option is process.env, which gives all commands access to the current process context. If we want to override the default behavior, we can simply pass an empty object, or a new value to the env option as the only environment variable:

const child = spawn('echo $ANSWER', {
  stdio: 'inherit',
  shell: true,
  env: { ANSWER: 42 },
});
Copy the code

The echo command above does not have access to the parent process environment variables. For example, it can’t access the $HOME directory, but it can access the $ANSWER directory because it is passed a specified environment variable with the env option.

The last important child process option to explain here, the detached option, enables the child process to run independently of its parent.

Suppose we have a file timer.js that keeps the event loop busy:

setTimeout(() => {
  // keep the event loop busy
}, 20000);
Copy the code

Using the detached option, we can execute this code in the background:

const { spawn } = require('child_process');

const child = spawn('node', ['timer.js'], {
  detached: true,
  stdio: 'ignore'
});

child.unref();
Copy the code

The specific behavior of the separated child process depends on the operating system. On Windows, the separated child process has its own console window, whereas on Linux, the separated child process becomes the lead process for the new process group and session.

If the unref function is called in a separate child process, the parent process can exit independently of the child. This function is useful if the child process is a long-running process. But to keep the child running in the background, the child’s STDIO configuration must also be independent of the parent.

The example above runs a Node script (timer.js) in the background by separating and ignoring the parent process’s STdio file descriptor. So the parent process can terminate at any time while the child process is running in the background.

Gif from my video teaching course – Node.js Advanced

ExecFile function

If you don’t want to execute a file using a shell, the execFile function is just what you want. It behaves exactly like the exec function, but does not use the shell, which makes it more efficient. On Windows, some files cannot be executed on their own, such as.bat or.cmd files. These files cannot be executed using execFile, and to execute them, the shell must be set to true and only exec or spawn can be used.

* the Sync function

All child_process modules have synchronized blocking versions that wait until the child process exits.

Screenshots from my video teaching course – Node.js Advanced

These synchronized versions help to some extent to simplify scripting tasks or some starting process tasks. But other than that, we should avoid them.

Fork () function

Fork is a variant of the spawn function for the derived Node process. The main difference between spawn and fork is that with fork, the communication channel is set up on the child process, so we can use send on the fork process, which has a global Process object that can be used to pass messages between the parent and the fork process. This function is implemented through the EventEmitter module interface. Here’s an example:

Parent file, parent.js:

const { fork } = require('child_process');

const forked = fork('child.js');

forked.on('message', (msg) => {
  console.log('Message from child', msg);
});

forked.send({ hello: 'world' });
Copy the code

Child file, child.js:

process.on('message', (msg) => {
  console.log('Message from parent:', msg);
});

let counter = 0;

setInterval(() => {
  process.send({ counter: counter++ });
}, 1000);
Copy the code

In the parent file above, we fork child.js (which will execute the file via the Node command) and listen for message events. Once the child uses process.send, which we’re actually executing every second, the message event is fired,

To enable the parent process to pass messages down to the child process, we can execute send on the fork object itself, and then listen for message events on the global Process object in the child file.

When the parent. Js file above is executed, it will first send down the {hello: ‘world’} object, which will be printed out by the child of the fork. The child of fork then sends an incremented count every second, which is printed out by the parent.

Screenshots from my video teaching course – Node.js Advanced

Let’s implement a more practical example using the fork function.

There is an HTTP server that handles both endpoints. An endpoint (/compute below) is computationally intensive and can take several seconds to complete. We can simulate this with a long loop:

const http = require('http');

const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};

const server = http.createServer();

server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const sum = longComputation();
    return res.end(`Sum is ${sum}`);
  } else {
    res.end('Ok')
  }
});

server.listen(3000);
Copy the code

One big problem with this program is that when the /compute endpoint is requested, the server can’t handle other requests because the long loop keeps the event loop busy.

There are several solutions to this problem, depending on the nature of the long operations. The solution that works for all operations, however, is to fork the computation to another process.

We first move the longComputation function to its own file and call it in that file when the main process sends notification via message:

In a new compute. Js file:

const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};

process.on('message', (msg) => {
  const sum = longComputation();
  process.send(sum);
});
Copy the code

We can now fork the compute.js file and implement message communication between the server and the replica process using the message interface instead of performing time-consuming operations in the main process event loop.

const http = require('http');
const { fork } = require('child_process');

const server = http.createServer();

server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const compute = fork('compute.js');
    compute.send('start');
    compute.on('message', sum => {
      res.end(`Sum is ${sum}`);
    });
  } else {
    res.end('Ok')
  }
});

server.listen(3000);
Copy the code

In the above code, when a request comes in for /compute, we can simply send a message to the replicate process to initiate the time-consuming operation. The event loop of the main process does not block.

Once the replicate process has completed the time-consuming operation, it can send the result back to the parent process with process.send.

In the parent process, we listen for message events on the child process of fork itself. When this event fires, we get a ready sum and send it to the request over HTTP.

In the above code, of course, the number of processes we can fork is limited. But when this code is executed, the endpoint where the HTTP request takes time, the main server does not block at all and can accept more requests.

The cluster module, the subject of my next article, is based on the idea of fork and load-balancing requests for child processes that come from a large number of forks and can be created on any system.

That’s all I have to say on this topic. Thanks for reading! See you next time!

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. Android, iOS, React, front end, back end, product, design, etc. Keep an eye on the Nuggets Translation project for more quality translations.

Node.js child processes: Everything you need to know

How to use spawn(), exec(), execFile() and fork()

Subprocess module

A derived child process

Shell syntax and exec functions

ExecFile function

* the Sync function

Fork () function

Related Posts

Spring Data Redis Stream

Talk to the interviewer about MySQL transaction isolation for half an hour, from basic concepts down to implementation

Code cloud code repository Git use steps