Introduction to the

We know that event loop is the basis of event processing in NodeJS. The main initialization and callback events run in event loop. In addition to event loops, NodeJS also has Worker pools to handle time-consuming operations, such as I/O operations.

The secret to nodeJS’s high efficiency is the use of asynchronous IO so that a small number of threads can handle a large number of client requests.

At the same time, we need to be careful when writing NodeJS programs because we use a small number of threads.

Event Loop and Worker Pool

There are two types of threads in NodeJS. The first type of thread is the Event Loop, which can also be called the main thread. The second type is the N Workers threads in a Worker Pool.

If the two threads take too much time to execute the callback, they are considered blocked.

Thread blocking affects program performance in the first place, because some threads are blocked and system resources are tied up. Because the total resources are limited, there are fewer resources to process other business, which affects the overall performance of the program.

Second, if threads are often blocked, it is likely to be launched by malicious attackers DOS attacks, resulting in the failure of normal business.

Nodejs uses an event-driven framework, where Event loops are used to handle callbacks registered for various events, as well as non-blocking asynchronous requests, such as network I/O.

The Worker Pool implemented by Libuv mainly exposes the API of submitting tasks, so as to process some expensive tasks. These tasks include CPU intensive operations and some blocking IO operations.

Nodejs itself has many modules that use Worker pools.

For example, IO intensive operations:

Dns.lookup (), dns.lookupservice () in the DNS module.

With the exception of fs.fswatcher () and explicitly synchronized File system apis, all other File system modules use Worker pools.

CPU intensive operations:

Crypto modules: crypto.pbkdf2(), Crypto.scrypt(), Crypto.randomBytes(), Crypto.randomfill (), Crypto.generatekeypair ().

Zlib module: Except for the API to display synchronization, all other apis use worker pool.

This is generally the case with Worker Pool modules, but you can also use nodejs’s C++ add-on to submit tasks to Worker pools.

Event loop and queue in worker pool

In the previous file, we talked about the use of queue in the Event loop to store the event callback, which is actually inaccurate.

The Event Loop actually maintains a collection of file descriptors. These file descriptors use the operating system kernel ePOLL (Linux), KQueue (OSX), Event Ports (Solaris), or IOCP (Windows) to listen for events.

When the operating system detects that the event is ready, the Event loop calls the event bound to the callback event, and finally executes the callback.

On the contrary, worker pool really saves the queue of tasks to be executed, and the tasks in the queue are executed by each worker. When completed, Woker notifies the Event Loop that the task has completed.

Blocking the event loop

Nodejs has a limited number of threads. If a thread is blocked, the execution of the entire application may be affected. Therefore, we must carefully consider event loop and worker pool in the process of program design to avoid blocking them.

Event Loop is mainly concerned with connecting users and responding to users’ requests. If the Event loop is blocked, users’ requests will not be timely responded to.

Since the Event loop primarily executes a callback, our callback execution time must be short.

Time complexity of event loop

Time complexity is generally used to judge the running speed of an algorithm. Here we can also use this concept to analyze the callback in event loop.

If the time complexity in all callbacks is constant, then we can guarantee that all callbacks will be executed fairly.

However, if the time complexity of some callbacks is variable, then we need to consider it carefully.

app.get('/constant-time'.(req, res) = > {
  res.sendStatus(200);
});
Copy the code

Let’s look at a constant time case. In the above example, we directly set the status of respose, which is a constant time operation.

app.get('/countToN'.(req, res) = > {
  let n = req.query.n;

  // n iterations before giving someone else a turn
  for (let i = 0; i < n; i++) {
    console.log(`Iter ${i}`);
  }

  res.sendStatus(200);
});
Copy the code

The above example is an O(n) time complexity, depending on the n passed in to the request, we can get different execution times.

app.get('/countToN2'.(req, res) = > {
  let n = req.query.n;

  // n^2 iterations before giving someone else a turn
  for (let i = 0; i < n; i++) {
    for (let j = 0; j < n; j++) {
      console.log(`Iter ${i}.${j}`);
    }
  }

  res.sendStatus(200);
});
Copy the code

The example above is an O(n^2) time complexity.

How should this situation be handled? First of all, we need to estimate the response limit that the system can bear, and set the parameter limit that the user passes in. If the data that the user passes in is too long and beyond our processing scope, we can directly limit it from the user input end, so as to ensure the normal operation of our program.

The Node.js core module is not recommended for Event Loop

Among the core modules in NodeJS, some methods are synchronous blocking apis that are expensive to use, such as compression, encryption, synchronous IO, child processes, and so on.

These apis are intended for use in the REPL environment, and we should not use them directly in our server-side programs.

What apis are not recommended to use on the server side?

  • Encryption:

Crypto. RandomBytes crypto. RandomFillSync crypto. Pbkdf2Sync

  • Compression:

zlib.inflateSync zlib.deflateSync

  • File system:

Do not use the FS synchronization API

  • Child process:

child_process.spawnSync child_process.execSync child_process.execFileSync

Partitioning or offloading

In order not to block the Event loop and to give other events a chance to run, we actually have two solutions: Partitioning and offloading.

Partitioning means divide and conquer, dividing a long task into several pieces and executing one at a time while giving other events some time to run so they don’t block the Event loop.

Here’s an example:

for (let i = 0; i < n; i++)
  sum += i;
let avg = sum / n;
console.log('avg: ' + avg);
Copy the code

Let’s say we want to average n numbers. In the example above our time complexity is O(n).

function asyncAvg(n, avgCB) {
  // Save ongoing sum in JS closure.
  var sum = 0;
  function help(i, cb) {
    sum += i;
    if (i == n) {
      cb(sum);
      return;
    }

    // "Asynchronous recursion".
    // Schedule next operation asynchronously.
    setImmediate(help.bind(null, i+1, cb));
  }

  // Start the helper, with CB to call avgCB.
  help(1.function(sum){
      var avg = sum/n;
      avgCB(avg);
  });
}

asyncAvg(n, function(avg){
  console.log('avg of 1-n: ' + avg);
});
Copy the code

Here we use setImmediate, which breaks sum’s task down into steps. Although asyncAvg needs to be executed many times, each event loop can be guaranteed not to be blocked.

Partitioning, although logically simple, is not suitable for some large computing tasks. And Partitioning itself still runs in event loops, so it doesn’t enjoy the advantages of multi-core systems.

In this case, we need to offload the task into the worker Pool.

There are two ways to use the Worker Pool. The first way is to use the Worker Pool built with nodejs. We can develop C++ addon or node-webworker-threads by ourselves.

The second way is to create the Worker Pool by ourselves. We can use Child Process or Cluster to achieve this.

Offloading also has disadvantages. The biggest disadvantage of offloading is the loss of interaction with Event Loop.

The limitations of the V8 engine

Nodejs runs on a V8 engine, which is generally good enough and fast enough, but there are two exceptions: regular expressions and JSON manipulation.

REDOS Regular expression DOS attack

What’s wrong with regular expressions? Regular expressions have a problem with pessimistic backtracking.

What is pessimism backtracking?

For example, let’s say you’re already familiar with regular expressions.

Suppose we use /^(x*)y$/ to match the string xxxxxxy.

The first group after the match (the match value in parentheses) is XXXXXX.

If we rewrite the regular expression as /^(x*)xy$/ to match the string xxxxxxy. The result of the match is XXXXX.

How does this process work?

First (x*) will match as many x’s as possible until it hits the character y. At this point, delta x star has already matched six x’s.

Then the regular expression continues to execute xy after (x*) and finds that it cannot match. In this case, (x*) spits out one x from the six x that have been matched and executes xy in the regular expression again. If the match is found, the regular expression ends.

This process is a backtracking process.

If the regular expression is not written well, pessimistic backtracking is possible.

Again, but this time we use /^(x*)y$/ to match the string XXXXXX.

Following the above flow, we know that the regular expression needs to be backtracked six times before the match fails.

Consider some extreme cases that could lead to a very large amount of backtracking, resulting in a spike in CPU usage.

We call DOS attacks on regular expressions REDOS.

Here’s an example of REDOS in NodeJS:

app.get('/redos-me'.(req, res) = > {
  let filePath = req.query.filePath;

  // REDOS
  if (filePath.match(/ (\ /. + + $/)) {
    console.log('valid path');
  }
  else {
    console.log('invalid path');
  }

  res.sendStatus(200);
});
Copy the code

In the callback above, we intended to match paths like /a/b/c. But if the user enters filePath=///… /\n, if there are 100 slashes, followed by a newline character.

This leads to pessimistic backtracking of regular expressions. Because. Means match any single character except newline \n. But it was only at the end that we realized that there was no match, so we had a REDOS attack.

How do you avoid REDOS attacks?

On the one hand, there are some off-the-shelf regular expression modules that we can use directly, such as Safe-regex, rxxR2, and Node-re2.

On the one hand, you can go to www.regexlib.com to find the regular expression rules to use. These rules are proven, and you can reduce the mistakes of writing regular expressions yourself.

JSON DOS attack

Json. parse and json. stringify are two common JSON operations, but the duration of these operations is dependent on the length of the input JSON.

Here’s an example:

var obj = { a: 1 };
var niter = 20;

var before, str, pos, res, took;

for (var i = 0; i < niter; i++) {
  obj = { obj1: obj, obj2: obj }; // Doubles in size each iter
}

before = process.hrtime();
str = JSON.stringify(obj);
took = process.hrtime(before);
console.log('JSON.stringify took ' + took);

before = process.hrtime();
pos = str.indexOf('nomatch');
took = process.hrtime(before);
console.log('Pure indexof took ' + took);

before = process.hrtime();
res = JSON.parse(str);
took = process.hrtime(before);
console.log('JSON.parse took ' + took);
Copy the code

In the example above, we parse obj, which is relatively simple. If the user passes in a large JSON file, the event loop will block.

The solution is to limit the user’s input length. Or use asynchronous JSON apis, such as JSONStream and big-friendly JSON.

Blocking the Worker Pool

The idea behind NodeJS is to use the smallest thread to handle the largest client connection. We also talked about putting complex operations in Worker pools to take advantage of thread pools.

However, the number of threads in the thread pool is also limited. If a thread executes a long Run task, there is one less worker thread in the thread pool.

Malicious attackers can actually seize this weakness of the system to carry out DOS attacks.

Therefore, the optimal solution to long run tasks in Worker Pool is partitioning. So that all tasks have an equal opportunity to execute.

Of course, if you can clearly distinguish short tasks from long run tasks, we can actually construct different worker pools to serve different task types.

conclusion

Event loop and worker pool are two different event processing mechanisms in NodeJS, which need to be selected according to actual problems in the program.

Author: Flydean program stuff

Link to this article: www.flydean.com/nodejs-bloc…

Source: Flydean’s blog

Welcome to pay attention to my public number: “procedures those things” the most popular interpretation, the most profound dry goods, the most concise tutorial, many you do not know the small skills you find!