“When I first knew I was going to write this article, I actually refused, because I thought, you can’t ask me to write it right away, I need to have something good, write some cliches and then add a lot of special effects, the Node.js performance is like Duang~, that readers will scold me, There is no such thing as performance optimization in Node.js, it’s fake.” —— Stark Chan Wong


1. Use the latest version of Node.js

Performance can be improved easily by simply upgrading node.js versions because almost any new version of Node.js will perform better than the old version. Why?

The performance improvement in each version of Node.js comes from two main aspects:

  • V8 version update;
  • Node.js internal code update optimization.

The latest VERSION of V8 7.1, for example, optimizes closure escape analysis in some cases, giving some of Array’s methods a performance boost:


The internal node.js code will also undergo significant optimizations as the node.js version is upgraded. For example, the following graph shows the changes in the performance of require as the Node.js version is upgraded:


Every PR submitted to Node.js will be reviewed to see if it degrades current performance. There is also a dedicated benchmarking team to monitor performance changes, and you can see the performance changes for each version of Node.js here:

Node.js Benchmarking

So, you can rest assured about the performance of the new version of Node.js, and feel free to submit an issue if you see any performance degradation under the new version.

How do I choose the version of Node.js?

Here’s a quick overview of the node.js versioning strategy:

  • Node.js versions are divided into Current and LTS.
  • Current is the latest version of Node.js that is still under development;
  • LTS are stable, long-term maintenance releases;
  • Node.js releases major updates every six months (April and October), which can cause incompatible updates;
  • The version released in April (with an even number, for example, V10) is the LTS version, which is a long-term supported version that the community maintains for 18 + 12 months starting in October of the release year (Active LTS + Maintaince LTS).
  • Releases in October (with an odd number, such as V11 now) only have an 8-month maintenance period.

For example, right now (November 2018), Node.js Current is v11, LTS is V10 and V8. The older V6 is on Maintenace LTS and will not be maintained as of next April. V9, released last October, ended maintenance in June.


For production environments, node.js is officially recommended to use the latest LTS release, currently v10.13.0.


2, use,fast-json-stringifyAccelerating JSON serialization

Generating JSON strings is very handy in JavaScript:

const json = JSON.stringify(obj)
Copy the code

But few would have guessed that there was room for performance optimization, using JSON Schema to speed up serialization.

In JSON serialization, we need to recognize a large number of field types. For string, we need to add “to each side. For array, we need to iterate through the array, separate each object with a comma, then add [and] to each side, and so on.

If you already know the type of each field from the Schema in advance, you don’t need to traverse and identify the field type. Instead, you can serialize the corresponding field directly, which greatly reduces the computation overhead. This is the principle of fast-JSON-stringfy.

Depending on the run score in the project, in some cases it can be up to 10 times faster than json.stringify!


A simple example:

const fastJson = require('fast-json-stringify') const stringify = fastJson({ title: 'Example Schema', type: 'object', properties: { name: { type: 'string' }, age: { type: 'integer' }, books: { type: 'array', items: { type: 'string', uniqueItems: true } } } }) console.log(stringify({ name: 'Starkwang', age: 23, books: [' Primier c + + ', 'ring け! ユ ー フ ォ ニ ア ム ~']})) / / = > {" name ":" Starkwang ", "age" : 23, "books" : [" c + + Primier ", "ring け! ユ ー フ ォ ニ ア ム ~"]}Copy the code

In node.js middleware, there is a lot of data in JSON, and the STRUCTURE of JSON is very similar (especially if you use TypeScript), which makes it a good place to optimize using JSON Schema.


3. Improve Promise performance

Promise is a panacea for callback nesting hell, especially since the combination of async/await has become the ultimate solution for asynchronous JavaScript programming, and a large number of projects are now using this pattern.

But elegant syntax can also hide performance costs. We can use an existing running project on Github to test this. Here are the results:

File time(ms) Memory (MB) Callbacks -baseline. Js 380 70.83 Promises -bluebird.js 554 97.23 Promises - Bluebird-generator.js 591 97.05 Async-bluebird.js 593 105.43 Promises -es2015-util. Promisify. Js 1203 219.04 Promises -es2015-native 227.03 Async-ES2017-native-js 1312 231.08 Async-es2017-util.promisify. Js 1550 228.74 Platform Info: Darwin 18.0.0x64 node.js 11.1.0 V8 7.0.276.32-node.7 Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz x 4Copy the code

We can see from the results that the performance of native Async /await + Promise is much worse than callback and the memory footprint is much higher. For middleware projects with a lot of asynchronous logic, the performance overhead is not negligible.

By comparison, it can be found that the performance loss mainly comes from the implementation of the Promise object itself. The Promise of V8 native implementation is much slower than the Promise library implemented by third parties like Bluebird. The async/await syntax does not incur much performance penalty.

So for middleware projects with a lot of asynchronous logic and light computation, you can change the global Promise into bluebird’s implementation in the code:

global.Promise = require('bluebird');
Copy the code

4. Write asynchronous code correctly

With async/await, the project’s asynchronous code looks pretty good:

const foo = await doSomethingAsync();
const bar = await doSomethingElseAsync();
Copy the code

But as a result, we sometimes forget to use other capabilities that Promise gives us, such as the parallelism of promise.all () :

// bad
async function getUserInfo(id) {
    const profile = await getUserProfile(id);
    const repo = await getUserRepo(id)
    return { profile, repo }
}

// good
async function getUserInfo(id) {
    const [profile, repo] = await Promise.all([
        getUserProfile(id),
        getUserRepo(id)
    ])
    return { profile, repo }
}
Copy the code

For example promise.any () (this method is not in the ES6 Promise standard and can be replaced with the standard promise.race ()), we can easily implement more reliable and faster calls:

Async function getServiceIP(name) {async function getServiceIP(name) {// Get the service IP from DNS and ZooKeeper, which is returned first. Return await Promise. Any ([getIPFromDNS(name), getIPFromZooKeeper(name)])}Copy the code

Optimize the V8 GC

There have been many articles about V8’s garbage collection mechanism, which I won’t repeat here. Two articles are recommended:

V8 GC Log (1) : Node.js application background and GC basics
V8 GC Log (2) : Partition of internal and external memory and GC algorithm

In everyday code development, it’s easy to step into the following pitfalls:

Pit 1: Slow Old Space garbage collection by using large objects as cache

Example:

const cache = {} async function getUserInfo(id) { if (! cache[id]) { cache[id] = await getUserInfoFromDatabase(id) } return cache[id] }Copy the code

Here we use a variable cache as a cache to speed up the query of user information. After many queries, the cache object will enter the old generation and become extremely large, and the old generation uses the tri-color tag + DFS method for GC. A large object directly leads to an increase in GC time (and also a risk of memory leaks).

The solution is:

  • With an external cache like Redis, in fact an in-memory database like Redis is perfect for this scenario;
  • Limit the size of locally cached objects, such as using FIFO, TTL, and other mechanisms to clean up the cache in objects.

Pit 2: Insufficient Cenozoic space leads to frequent GC

This pit will be hidden.

Node.js allocates 64MB of memory to the new generation by default, but because the New generation GC uses the Scavenge algorithm, it can only use half as much memory as 32MB.

When business code frequently produces a large number of small objects, this space can easily fill up, triggering GC. Although the new generation GC is much faster than the old generation GC, frequent GC can still significantly affect performance. In extreme cases, GC can take up about 30% of the total computation time.

The solution is to change the memory limit of the new generation to reduce the number of GC sessions when starting Node.js:

node --max-semi-space-size=128 app.js
Copy the code

Of course, some people will certainly ask, the new generation of memory is the bigger the better?

As memory increases, the number of GCS decreases, but the time required for each GC also increases. Therefore, larger is not better. The specific value needs to be measured by the profile of the business to determine how much new generation memory is best allocated.

But generally speaking, 64MB or 128MB is a reasonable allocation.


6. Use Stream correctly

Stream is one of the most basic concepts in Node.js. Most IO related modules in Node.js, such as HTTP, NET, FS, repL, are built on top of various streams.

As most of us know, we don’t need to read a large file into memory. Instead, we Stream it out:

const http = require('http');
const fs = require('fs');

// bad
http.createServer(function (req, res) {
    fs.readFile(__dirname + '/data.txt', function (err, data) {
        res.end(data);
    });
});

// good
http.createServer(function (req, res) {
    const stream = fs.createReadStream(__dirname + '/data.txt');
    stream.pipe(res);
});
Copy the code

The proper use of Stream in business code can greatly improve performance, of course, but it is likely to be ignored in real business. For example, for projects using React server rendering, renderToNodeStream can be used:

const ReactDOMServer require('react-dom/server')
const http = require('http')
const fs = require('fs')
const app = require('./app')

// bad
const server = http.createServer((req, res) => {
    const body = ReactDOMServer.renderToString(app)
    res.end(body)
});

// good
const server = http.createServer(function (req, res) {
    const stream = ReactDOMServer.renderToNodeStream(app)
    stream.pipe(res)
})

server.listen(8000)
Copy the code

Manage streams using pipelines

In the old days of Node.js, handling a stream was cumbersome, for example:

source.pipe(a).pipe(b).pipe(c).pipe(dest)
Copy the code

If any of the streams in source, A, B, C, or Dest fail or close, the entire pipeline will stop. In this case, we need to manually destroy all the streams, which is very cumbersome at the code level.

So the community created libraries like Pump to automatically control stream destruction. Node.js V10.0 adds a new feature called stream.pipeline that replaces pump to help manage streams better.

An official example:

const { pipeline } = require('stream'); const fs = require('fs'); const zlib = require('zlib'); pipeline( fs.createReadStream('archive.tar'), zlib.createGzip(), fs.createWriteStream('archive.tar.gz'), (err) => { if (err) { console.error('Pipeline failed', err); } else { console.log('Pipeline succeeded'); }});Copy the code

Implement your own high-performance Stream

In business, you might also implement a Stream yourself, either readable, writable, or bidirectional, as described in the documentation:

  • implementing Readable streams
  • implementing Writable streams

While Stream is amazing, implementing Stream yourself can also have hidden performance issues, such as:

class MyReadable extends Readable { _read(size) { while (null ! == (chunk = getNextChunk())) { this.push(chunk); }}}Copy the code

When we call new MyReadable().pipe(XXX), we push out all the chunks from getNextChunk() until we finish reading. However, if the next step in the pipeline is slow, data will accumulate in memory, resulting in a larger memory footprint and lower GC speed.

The correct behavior is to select the correct behavior based on the return value of this.push(). When the return value is false, the chunk is full and should be stopped.

class MyReadable extends Readable { _read(size) { while (null ! == (chunk = getNextChunk())) { if (! this.push(chunk)) { return false } } } }Copy the code

This problem is explained in detail in an official Node.js article:

Backpressuring in Streams | Node.js


Are C++ extensions necessarily faster than JavaScript?

Node.js is ideal for IO intensive applications, but for computation-intensive businesses, many people will think of writing C++ Addon to optimize performance. But the C++ extension isn’t a panacea, and V8’s performance isn’t as bad as expected.

For example, I migrated node.js net.isipv6 () from C++ to js implementations in September, resulting in performance improvements ranging from 10% to 250% in most test cases (see PR here).

JavaScript runs faster than C++ extensions on V8, and this happens mostly with strings and regular expressions because the regular expression engine used internally is irregexp, This regular expression engine is much faster than the built-in boost engine (Boost :: Regex).

It’s also worth noting that node.js’ C++ extensions can consume a lot of performance when it comes to type conversions, and if you don’t pay attention to the details of the C++ code, performance can degrade significantly.

Here is an article comparing the performance of C++ and JS with the same algorithm:

How to get a performance boost using Node.js native addons

The notable conclusion is that the C++ code, after converting the String in the argument (String::Utf8Value to STD :: String), does not perform even half as well as the JS implementation. Only with the type encapsulation provided by NAN can you achieve higher performance than JS.


In other words, whether C++ is more efficient than JavaScript is a matter of case by case, and in some cases C++ extensions are not necessarily more efficient than native JavaScript. If you are not so confident about your C++ skills, you are actually advised to use JavaScript, as V8 performs much better than you think.


8. Use Node-Clinic to quickly locate performance problems

Having said that, is there anything that works in five minutes right out of the box? B: of course.

Node-clinic is a Node.js performance diagnostic tool provided by NearForm. It can quickly locate performance problems.

npm i -g clinic
npm i -g autocannon
Copy the code

To use it, start the service process:

clinic doctor -- node server.js
Copy the code

Then we can run a pressure test using any of the tools, such as the same author’s Autocannon (you can also use ab, curl, etc.). :

autocannon http://localhost:3000
Copy the code

After pressing, we CTRL + C to close the process started by Clinic and the report will be generated automatically. For example, the following is the performance report of one of our middleware services:


As we can see from the CPU usage curve, the performance bottleneck of this middleware service is not its own internal computation, but the SLOW I/O speed. Clinic also tells us that it detected a potential I/O problem.

Here we use clinic BubbleProf to detect I/O problems:

clinic bubbleprof -- node server.js
Copy the code

After another manometry, we got a new report:


In this report, we can see that the HTTP. Server is in pending state 96% of the time during the whole program running. When clicked, we can find a large number of empty frames in the call stack, that is, due to the limitation of network I/O, there is a large number of IDLE CPU. This is very common in middleware businesses and points to optimization not within the service itself, but in the gateway of the server and the corresponding speed of the dependent service.

To learn how to read the report produced by Clinic Bubbleprof, you can see it here: clinicjs.org/bubbleprof/…

Clinic can also detect computing performance problems within the service, so let’s do some “breaking” to make the service’s performance bottleneck appear on CPU computing.

We added cpu-consuming “destructive” code to some middleware that idled 100 million times:

function sleep() {
    let n = 0
    while (n++ < 10e7) {
        empty()
    }
}
function empty() { }

module.exports = (ctx, next) => {
    sleep()
    // ......
    return next()
}
Copy the code

Then, using Clinic Doctor, repeat the above steps to generate the performance report:


This is a very typical “case” of synchronous computation blocking the asynchronous queue, that is, there is so much computation on the main thread that the asynchronous callback of JavaScript cannot be fired in time, and the Event Loop is extremely late.

For such applications, we can continue to use Clinic Flame to determine exactly where intensive computing occurs:

clinic flame -- node app.js
Copy the code

After the pressure test, we have the flame chart (here we have reduced the number of idles to 1 million to make the flame chart look less extreme) :


In this picture, you can clearly see the big white bar at the top, which represents the CPU time spent idling the sleep function. With a flame map like this, it is very easy to see how much CPU is being consumed, to locate where intensive computations are occurring in the code, and to find performance bottlenecks.


Advertising area

We are the TCB team of Tencent Cloud, one of the few development teams in Tencent with Node.js and Golang middleware as the core. Currently, the main cloud products are:

  • Small program cloud
  • Tencent Cloud · Cloud development

The team is currently in a period of rapid development with rapid business expansion, and some positions need to recruit talents:

1. Background development engineer of small program cloud Development (Shenzhen)

responsibility

  • Responsible for the background development of small program cloud development products;
  • Responsible for the development and maintenance of cloud development system architecture.

Post requirements

  • 1. Bachelor degree or above, major in computer related, more than 2 years working experience;
  • 2 + years of Unix/Linux C/C++, Golang, Python development experience;
  • Familiar with Unix/Linux operating system principles and common tools;
  • Comprehensive and solid software knowledge structure (operating systems, software engineering, design patterns, data structures, database systems, network security);
  • Good analytical and problem solving skills, able to undertake tasks independently and have systematic progress control ability;
  • Studious, strong sense of responsibility, careful and agile thinking, good external communication and team work ability; Extensive system development experience is preferred.

2. Product planning of small program cloud development (Shenzhen)

responsibility

  • Responsible for the planning of cloud products related to Tencent small program cloud development;
  • Responsible for product plan design of small program cloud development platform capabilities, analysis and research of requirements, and complete product documentation.

Post requirements

  • Bachelor degree or above, more than 3 years working experience;
  • Strong executive ability, product design and commercial packaging ability, product understanding, strong teamwork and communication skills, active thinking, learning ability, strong ability to work under pressure;
  • Experience in SaaS, PaaS product design is preferred;
  • Small program development or B/C side product design and operation experience is preferred.