Recently, I have been reviewing the related knowledge of NodeJS. I have re-read the profound and Simple Node.js written by Park Ling Daisheng, combined with some articles on the Internet, and made some notes on NodeJS to further consolidate my relevant knowledge.

Introduction of the Node

Single thread

Node maintains the single-threaded nature of JavaScript in the browser.

  • Benefits: You don’t have to worry about synchronizing state like multithreaded programming, no deadlocks, and no performance overhead of thread context exchange
  • Weaknesses: 1. Unable to utilize multi-core CPU 2. Error will cause entire application to exit 3. The asynchronous I/O cannot be called because the CPU is occupied by a large number of computations

Node uses the same idea as Web Worker to solve the large computation problem of a single thread: child_process

Application scenarios

  • I/O intensive

    The advantage of DENSE I/O is that Node takes advantage of the processing power of event loops, rather than starting every thread to service every request, which consumes very few resources.

  • cpu-intensive

    Due to single-threading, if there are long running computations (such as large loops), the CPU time slice cannot be released and subsequent I/ OS cannot be initiated. However, the large computing task can be appropriately adjusted and decomposed into several small tasks, so that the computing can be released in time without blocking the initiation of I/O calls. In this way, the benefits of parallel asynchronous I/O can be enjoyed at the same time, and the CPU can be fully utilized.

Node does not provide multiple threads for computation, but there are two ways to get the most out of the CPU.

  • How to write C/C++ extensions
  • By child process

Module mechanism

Module implementation of Node

To introduce a module into a Node, go through the following three steps:

  • Path analysis,require()Method takes an identifier as a parameter. In Node implementations, module lookups are based on such an identifier. Module identifiers are divided into the following categories:
    • Core modules, such as HTTP, FS, and PATH
    • .or.Start relative path file module
    • In order to/Start the absolute path file module
    • File modules in non-path form, such as customconnectThe module
  • File location
    • File extension analysis: The CommonJS module specification allows identifiers that do not contain file extensions, in which case Node will try to fill in the.js,.json, and.node extensions
    • Directory analysis and packages: During the analysis of identifiers,require()After analyzing the file name extension, it may not find the corresponding file, but it does get a directory. This is often the case when custom modules are introduced and module by module paths are searched. In this case, Node will treat the directory as a package.
  • Compilation execution, compilation and execution is the last stage in the introduction of a file module. Once the specific file is located, Node creates a new module object, which is loaded and compiled according to the path. For files with different extensions:
    • .jsFile. The fs module synchronously reads the files and compiles them for execution.
    • .nodeFile. This is written in C/C++ extension file, throughdlopen()Method loads the resulting file from the final compilation.
    • .jsonFile. After reading files synchronously through the FS module, useJSON.parse()Parse returns the result.
    • The rest of the extension files. They are loaded as.js files.

In Node, modules fall into two categories:

  • The modules provided by Node are called core modules: During the compilation of Node source code, they are compiled into binary execution files. When the Node process starts, some core modules are directly loaded into memory, so when these modules are introduced, the two steps of file positioning and compilation can be omitted, and the path analysis is preferred, so the loading speed is the fastest. Core modules fall into two categories:
    • JavaScript core module
    • C/C++ core module
  • User-written modules, called file modules: loaded dynamically at run time, require a complete path analysis, file location, compile execution process, and are slower than core modules.

Modularity specification

CommonJS, AMD, CMD difference

  • CommonJS is used on the server side, synchronously, such as NodeJS
  • AMD, CMD is used on the browser side, asynchronously, such as RequireJS and SeaJS
  • CommonJS: Each file is a module and does not define. Node uses this specification
  • AMD: Define a module with define, pay attention to the dependency in advance
  • CMD: Define a module using define and rely on the nearest one

ES6 import and require differences

  • The CommonJS module outputs a copy of the value, and the ES6 module outputs a reference to the value.
  • The CommonJS module is a runtime load, and the ES6 module is a compile-time output interface.
  • The ES6 module operates differently from CommonJS. ES6 modules are dynamically referenced and do not cache values, and variables within a module are bound to the module in which they reside.

Exports. Exports

  • Exports are also an object reference, which by default refers to the same object as module.exports
  • Exports. Exports do not change. Exports do not change
  • Exports = module.exports = somethings is usually used to fix them so that they point to the same object

Asynchronous I/O

Asynchronous I/O model of Node

The four core concepts are event loop, observer, request object and execution callback.

  • Event loop: When the process starts, Node creates a loop similar to while (true) to determine if there are any events that need to be handled. If so, it retrieves them and executes the callback function.
  • Observer: The observer is used to determine whether an event needs to be handled. There are one or more observers in the event loop, and the judging process asks the observers if there are any events that need to be handled. The process is similar to the relationship between a chef and a front desk clerk in a restaurant. After each round, the chef will ask the waiter at the front desk if there is anything to cook. If there is, he will continue to cook. If not, he will leave. In this process, the receptionist acts as the observer, and the customer order she receives is the callback function. Note: The event loop is a typical producer/consumer model. The asynchronous I/O, network request is the producer, and the event loop takes the event from the observer and processes it.
  • Request object: In fact, there is an intermediate between the JavaScript invocation and the kernel’s completion of the I/O operation, called the request object.
  • Performing the callback: Assembling the request object and feeding it into the I/O thread pool for execution actually completes the first part of asynchronous I/O and the callback notification is the second part. The uv_fs_thread_proc method is called when there are available threads in the thread pool. This method calls the underlying function based on the type passed in. For example, uv_fs_open calls the fs__open method. After the call is complete, the fetched result is set on req->result. Notify us then calls the PostQueuedCompletionStatus IOCP * object operation has been completed, and will be returned to the thread pool thread.

What is the difference between browser and Node Event loops?

On the Node side, microtasks are executed between stages of the event loop

  • Timers: This phase executes the callback of timer (setTimeout, setInterval)
  • I/O Callbacks phase: Handles a few unexecuted I/O callbacks from the previous loop
  • Idle, prepare: Used only in node
  • Poll phase: Gets a new I/O event, where the node blocks when appropriate
  • Check phase: Perform a callback to setImmediate()
  • Close Callbacks phase: Executes the socket’s close event callback

On the browser side, microTasks are executed after the macroTask in the event loop has finished executing

Asynchronous programming

Event publish/subscribe pattern

Event listener pattern is a pattern widely used in asynchronous programming. It is the eventization of callback functions, also known as publish/subscribe pattern. Node itself provides the Events module, which is a simple implementation of this pattern.

Avalanche problem

The avalanche problem is that when the browser cache is invalid, the database is flooded with concurrent traffic to perform queries. As a result, the database cannot handle such a large amount of traffic at the same time, which affects the performance of the website.

To solve the avalanche, push the requested callback function into the event queue, the core code is as follows:

var proxy = new EventProxy();
var status = "ready";
var select = function (callback) {
    proxy.once("selected", callback);
    if (status === "ready") {
        status = "pending";
        db.select("SQL".function (results) {
            proxy.emit("selected", results);
            status = "ready"; }); }};Copy the code

Promise/Deferrd model

In 2009, it was abstracted as A proposal draft by Kris Zyp and published in the CommonJS specification. At present, the Draft of CommonJS already includes Promise/A, Promise/B, Promise/D asynchronous models. Since Promise/A is more common and simpler, only the then() method is needed. The Promise pattern is slightly more elegant than the publish/subscribe pattern, but it doesn’t meet the actual needs of many scenarios, such as a purely asynchronous set of apis to collaborate on a list of things.

Process control library

Tail trigger with next

At present, tail triggering is most widely used in Connect middleware. When processing network requests, middleware can perform functions such as filtering, verification, and logging just like aspect oriented programming. The simplest middleware is as follows:

function(req, res, next) {
     / / middleware
}
Copy the code

Each middleware passes the request object, the response object and the tail trigger function, and forms a processing flow through the queue, as follows:

async

Currently the best known process control module, the Async module provides more than 20 methods for handling asynchronous multiple writing modes.

Step

Lightweight async is also consistent in API exposure because there is only one Step interface.

Wind

Asynchronous programming with the above several ideas completely different.

Compare the differences between the two scenarios: the event publish/subscribe pattern is a relatively primitive approach, the Promise/Deferred pattern contributes a nice abstraction of the asynchronous task model, focusing on encapsulating the asynchronous invocation part, and the process control library is much more flexible.

In addition to async, Step, EventProxy, wind and other schemes, there is also a kind of scheme to simplify process control through source code compilation. Streamline is a typical example.

Memory control

Garbage collection mechanism

  • Nodejs has v8 limited memory when executing JavaScript, which is about 1.4g for 64-bit and 0.7GB for 32-bit

  • All JS objects are allocated by heap, check process.memoryUsage()

  • Cause of memory limitation: During garbage collection, the JS thread suspends execution (to prevent the JS application logic from being different from what the garbage collector sees). A large amount of heap memory reclamation seriously affects performance

  • V8 memory as a whole includes the new generation and the old generation

      // Adjust the size of the memory limit
      node --max-old-space-size=1700 test.js // The unit is MB
      node --max-new-space-size=1024 test.js // The unit is KB
    
      // It takes effect when V8 is initialized. Once it takes effect, it cannot change dynamically
    Copy the code

The new generation

  • Consists of two reserved_semispace_size_ (32 bits 16mb and 62 bits 32 MB)
  • Scavenge algorithm is used to recycle, and Cheney algorithm is used to recycle

The advantage is that the time is short and the disadvantage is that only half of the heap memory is used. The new generation object has a short life cycle, which is suitable for this algorithm

The old generation

  • 1400 MB in 64 and 700 MB in 32
  • Use Mark-Sweep and Mark-Compact for garbage collection

V8 uses mark-sweep primarily and mark-compact only when there is not enough space to allocate a new promotion object

Incremental marking
  • Reduce time pauses caused by old generation full heap garbage collection
  • Start with the markup phase and break it down into many small steps that run alternately with the application logic
  • The maximum garbage collection pause time has been reduced to 1/6 of the original

Garbage collection is one of the factors that can affect performance. Minimize garbage collection, especially full heap garbage collection

View garbage collection logs
  • Add –trace_gc at startup time

Using –prof at startup, you can get v8 performance analysis data, including garbage collection time, which needs to be read by tools, in Node source deps/ V8 /tools, linux-tick-processor

Use memory efficiently

scope

  • Function calls, with, and global scopes can be scoped in JS
  • The identifier lookup (that is, the variable name) looks for the current scope, then up to the upper scope, all the way to the global scope
  • The global variable is not released until the process exits, which causes the reference object to stay in the old generation. You can delete it with delete or assign undefined or null(delete deletes an object’s properties, so assignment is better).

closure

  • Methods of the outer scope access the inner scope, which benefit from the higher-order function feature
  • When you assign a closure to an object you can’t control, you can cause a memory leak. When finished, assign the variable to another value or set it to null

Check the memory usage

  • Check the process memoryUsage of process.memoryusage (), where RSS is the process resident memory (memory occupied by nodes), heapTotal and heapUsed are the heap memoryUsage
  • Os.totalmem (),os.freemem() look at the system memory
Not all of the memory used by Node is allocated through V8. There is also off-heap memory for processing network and I/O streamsCopy the code

A memory leak

Cause: The cache and queue consumption is not timely, and the scope is not releasedCopy the code

The cache

  • Limit memory When caching, limit the size and release
  • Processes cannot share memory, so memory for caching is also used
In order to speed up module introduction, modules are cached after compilation, and due to exports, scopes are not released and live in the old generation. Be aware of memory leaksCopy the code

The queue status

  • Between producers and consumers
  • Monitor the length of the queue and reject if it exceeds the length
  • Any asynchronous call should include a timeout mechanism

Memory leak Detection

  • node-heapdump
  • node-memwatch

Large memory application

  • Use the stream module to process large files, fs createReadStream(),createWriteStream()
  • When string operations are not required, you can use Buffer operations without v8, which are not subject to V8’s memory limitations

Understand the Buffer

  • The JavaScript language has no mechanism for reading or manipulating binary data streams.
  • BufferClass was introduced as part of the Node.js API to make it possible to handle binary data streams in scenarios such as TCP streams or file system operations.
  • The Buffer is a solution to the limitation of the small memory used by V8 on the browser side. Instead, it directly applies large memory outside the underlying heap. However, it is afraid of increasing the CPU load on the existing application, so it uses the form of block application.
  • A buffer object is like an array in that its elements are hexadecimal two-digit numbers ranging from 0 to 255

Buffer memory allocation

Buffer allocates memory not from v8, but from node’s c++ module. Therefore, the buffer strategy is that c++ requests the memory and then allocates the memory in js. Because, processing a large number of bytes data can not use the need for a little memory to the operating system to apply for a little memory, this may cause a large number of memory application system calls, the operating system has a certain pressure.

A node uses a slab allocation mechanism. A slab is actually a fixed memory area that has been allocated. It has three states:

  • Full: Indicates the full assignment state
  • Partial: Partial assignment status
  • Empty: no state has been assigned

Transformation of Buffer

Buffer objects can be converted to strings. The supported encoding types are ASCII, UTF-8, UTF-16LE/UCS-2, Base64, Binary, and Hex

String to Buffer

This is done by the constructor, new Buffer(STR,[encoding]); Encoding The default encoding and storage is UTF-8.

Buffer to string

All you need is toString().

Buffer Indicates an unsupported encoding type

We can see if any encoding is supported by calling buffer.isencoding (encoding). For unsupported encoding formats, iconV and IconV-Lite can be used to solve the problem.

The joining together of Buffer

The chunk object obtained from the data event is actually a buffer. Data += chunk; This sentence is also spliced buffer. The essence is data = data.toString() + chunk.toString(); . There is a problem with Chinese support. Because the default is READ in UTF-8, only half of the fourth word is displayed. Also caused the generation of garbled code. The problem is worth noting.

To solve the garbled characters above, we should set some codec formats:

readable.setEncoding(encoding)
var rs = fs.createReadStream('test.md', { highWaterMark: 11});
rs.setEncoding('utf8');
Copy the code

Let’s take a look at this example, which improves the concatenation buffer:

var chunks = [];
var size = 0;
res.on('data'.function (chunk) {
    chunks.push(chunk);
    size += chunk.length;
});
res.on('end'.function () {
    var buf = Buffer.concat(chunks, size);
    var str = iconv.decode(buf, 'utf8');
    console.log(str);
});
Copy the code

The proper concatenation is to use an array to store all the buffer fragments received, and then call buffer.concat() to synthesize a buffer object.

Buffer and performance

Buffers are widely used in file IO and network IO. Whatever object is sent to the network needs to be converted to a buffer and then transferred in binary. Therefore, to provide IO efficiency, you can start with the buffer conversion.

Network programming

Create a TCP server program

var net = require('net');
var server = net.createServer(function (socket) {
    // New connection
    socket.on('data'.function (data) {
        socket.write("hello"); }); socket.on('end'.function () {
        console.log('Disconnected');
    });
    socket.write("Hello world, my dear\n");
});
server.listen(8124.function () {
    console.log('server bound');
});

// In order to show that the listener is a listener for the event connection, the listener can also be used in another way
var server = net.createServer();
server.on('connection'.function (socket) {
 // New connection
});
server.listen(8124);
Copy the code

Pipeline operation

var net = require('net');
var server = net.createServer(function (socket) {
socket.write('Echo server\r\n');
socket.pipe(socket);
});
server.listen(1337.'127.0.0.1');
Copy the code

Create a UDP server

var dgram = require("dgram");
var server = dgram.createSocket("udp4");
server.on("message".function (msg, rinfo) {
    console.log("server got: " + msg + " from " +
        rinfo.address + ":" + rinfo.port);
});
server.on("listening".function () {
    var address = server.address();
    console.log("server listening " +
        address.address + ":" + address.port);
});
server.bind(41234);
Copy the code

Building the HTTP service

var http = require('http');
http.createServer(function (req, res) {
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end('Hello World\n');
}).listen(1337.'127.0.0.1');
console.log('Server running at http://127.0.0.1:1337/);
Copy the code

Build webSocket services

Websocket has the following advantages over the traditional B/S mode:

  • The B end establishes TCP connections with the server to reduce the number of connections
  • The server realizes the need to push data to the B side
  • Lighter header protocol, reduced data transmission

Network services and security

In terms of network security, Node provides three modules: crypto, TLS, and HTTPS. Crypto is used for encryption and decryption, such as sha1, MD5 and other encryption algorithms. TLS is used to establish a TCP link based on TLS/SSL, which can be regarded as the encryption upgrade version of NET module. HTTPS is used to provide an encrypted version of HTTP, an encrypted upgraded version of HTTP, and even provides the same interfaces and events as the HTTP module.

Building a Web application

How to play the process

Multiprocess architecture

Faced with the problem of insufficient use of multiple cores by single process and single wire, the previous experience is to start multiple processes. Ideally, each process uses one CPU to realize the utilization of multi-core CPUS. Node provides the child_process module and the child_process.fork() function to replicate processes. Let’s look at the code:

//worker.js
var http = require('http');
http.createServer(function (req, res) {
   res.writeHead(200, {'Content-Type': 'text/plain'});
   res.end('Hello World\n');
}).listen(Math.round((1 + Math.random()) * 1000), '127.0.0.1');
Copy the code

This code is the classic code for node to start web services. Then we add the master.js module according to the architecture of master-workers.

//master.js
var fork = require('child_process').fork;
var cpus = require('os').cpus();
for (var i = 0; i < cpus.length; i++) {
	fork('./worker.js');
}
Copy the code

There are two processes: master is the main process and worker is the worker process.

Create a child process

The child_process module gives Node the ability to create child processes (child_process) at will. It provides four methods for creating child processes.

var cp = require('child_process');
cp.spawn('node'['worker.js']);
cp.exec('node worker.js'.function (err, stdout, stderr) {
// some code
});
cp.execFile('worker.js'.function (err, stdout, stderr) {
// some code
});
cp.fork('./worker.js');
Copy the code

Interprocess communication

After the child process is created by fork() or other API, in order to communicate with the child process, the parent process and the child process will create an IPC channel. The IPC channel is used to transmit messages between the parent and child process via message and send().

// parent.js
var cp = require('child_process');
var n = cp.fork(__dirname + '/sub.js');
n.on('message'.function (m) {
    console.log('PARENT got message:', m);
});
n.send({ hello: 'world' });
// sub.js
process.on('message'.function (m) {
    console.log('CHILD got message:', m);
});
process.send({ foo: 'bar' });
Copy the code

Principle of interprocess communication

The full name of IPC is inter-Process Communication, which means inter-process Communication. The purpose of interprocess communication is to allow different processes to access each other’s resources and to coordinate their work. The parent process creates the IPC channel and listens to it before actually creating the child process, and then tells the child process the file descriptor for the IPC communication through the environment variable (NODE_CHANNEL_FD). The child process in the process of starting, according to the file descriptor to connect to the existing IPC channel, so as to complete the connection between the father and child process.

Handle transfer

The send() method can send a handle in addition to sending data via IPC. The second optional argument is the handle:

child.send(message, [sendHandle])
Copy the code

A handle is a reference that can be used to identify a resource, and it contains a file descriptor that points to an object. Thus, handles can be used to identify a server socket object, a client socket object, a UDP socket, a pipe, and so on

This handle solves one problem. We can remove the proxy scheme and send the socket directly to the worker process after the main process receives the socket request without re-establishing a new socket connection with the worker process to forward data. Let’s look at the code implementation:

// parent.js
var cp = require('child_process');
var child1 = cp.fork('child.js');
var child2 = cp.fork('child.js');
// Open up the server object and send the handle
var server = require('net').createServer();
server.on('connection'.function (socket) {
    socket.end('handled by parent\n');
});
server.listen(1337.function () {
    child1.send('server', server);
    child2.send('server', server);
});
// Then print it

// child.js
process.on('message'.function (m, server) {
    if (m === 'server') {
        server.on('connection'.function (socket) {
            socket.end('handled by child, pid is ' + process.pid + '\n'); }); }});Copy the code

This can be handled back and forth between parent and child. Now this is the TCP layer transformation, and we’ll try it again with the HTTP layer.

// parent.js
var cp = require('child_process');
var child1 = cp.fork('child.js');
var child2 = cp.fork('child.js');
// Open up the server object and send the handle
var server = require('net').createServer();
server.listen(1337.function () {
    child1.send('server', server);
    child2.send('server', server);
    / / turn off
    server.close();
});

// Modify the process
// child.js
var http = require('http');
var server = http.createServer(function (req, res) {
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end('handled by child, pid is ' + process.pid + '\n');
});
process.on('message'.function (m, tcp) {
    if (m === 'server') {
        tcp.on('connection'.function (socket) {
            server.emit('connection', socket); }); }});Copy the code

In this way, the requests are handled by the child process, and see that during the entire process, the service process sends a change. After the main process sends the handle and closes the listen, it becomes the following structure:

Handle send and restore

The file descriptor is actually an integer value. This message object, when written to the IPC channel, is also serialized via json.stringify (), so that the final message sent to the IPC channel is a string. Just because the send() method can send messages and handles doesn’t mean it can send arbitrary objects.

The child process connected to the IPC channel can read the message sent by the parent process, parse the string and return it to the object through json.parse (), and then send the message event to the application layer. In this process, the message object is also filtered. If the value of message.cmd is prefixed with NODE_, it will respond to an internal event, internalMessage

If message. CMD is NODE_HANDLE, it will take the value of message.type and the resulting file descriptor to restore a corresponding object. Here is a schematic of the process:

Port listening

In the process of sending a node handle, multiple processes can listen to the same port without causing an EADDRINUSE exception. This is because the file descriptor of the TCP server socket in our independently started process is different, causing an exception to be thrown when listening to the same port. This option means that different processes can listen on the same network adapter and port, and that this server socket can be reused by different processes:

setsockopt(tcp->io_watcher.fd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on))
Copy the code

Because independently started processes do not know each other’s file descriptors, listening on the same port will fail, but for services restored from the send() handle, their file descriptors are the same, so listening on the same port will not cause an exception.

When multiple applications are listening on the same port, file descriptors can only be used by one process at a time. In other words, when a network request is sent to the server, only one lucky process can grab the connection, which means only it can service the request. These processes are also preemptive.

The path to cluster stability

Process events

The event name instructions
error This event is triggered when the child process cannot be replicated, cannot be killed, or cannot send messages
exit If the child process exits normally, the first argument to the event is the exit code, otherwise null, if the process was killed by the kill() method, the second argument, which indicates that the process was killed
close This event, with the same parameters as exit, is emitted when the child process’s standard input/output stream is aborted
disconnect This event is fired when the disconnect() method is called in the parent or child process, which closes the listening IPC channel

Automatic restart

If any uncaught exception occurs, the worker process will stop receiving new connections and exit the process when all connections are disconnected. The main process will immediately start a new process service upon hearing the exit of the worker process, so as to ensure that there are always processes in the whole cluster to serve the user again.

Suicide signals

When the worker process learns to exit, it sends a suicide signal to the main process and then stops receiving new connections, only to exit when all connections have been disconnected. The main process creates a new worker process service as soon as it receives a suicide signal

Set limit to restart

In extreme cases, the process may restart frequently, most likely because the code you wrote is faulty. To eliminate such meaningless restarts, repeated restarts should not be allowed as long as certain rules are met. For example, it stipulated how many times the work could be restarted within a unit of time. If the limit was exceeded, the giveup event would be triggered and the important event of abandoning the restart process would be notified.

Load balancing

By default, Node provides the operating system’s preemption strategy, in which idle processes preempt incoming requests and serve those who grab them. However, node preemption is based on CPU busy, so the IO is busy but the CPU is idle. Therefore, Node V0.11 provides a new round-robin policy. In round-robin scheduling, the main process accepts connections and distributes them to worker processes in turn. The distribution policy is that among N worker processes, the I =(I +1) MODN process is selected to send connections each time.

State sharing

Data cannot be shared between processes, but data such as configuration files and sessions should be consistent. Therefore, the third-party data storage scheme is generally adopted for function expansion. This uses DB, files, and caches to share state and data. We can use the child process timing polling method to synchronize the state, which is a way to exchange resources for functions, there will be a lot of resource waste, concurrency, data delay and so on.

The other is active notification, which reduces polling so that polling only occurs at the message queue level, and the rest is based on the scheduling and triggering of events. We call this process for sending notifications and asking if the state has changed a notification process, and this process should be designed to only poll and notify, without handling any business logic.

The cluster module

After node0.8, the cluster module was added to the kernel because child_process had too many things to do for the stand-alone cluster, so it was given the cluster module. Cluster can be more convenient to solve the problem of multi-CPU utilization, but also provides a better API to deal with process robustness issues

var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
    // Fork workers
    for (var i = 0; i < numCPUs; i++) {
        cluster.fork();
    }
    cluster.on('exit'.function (worker, code, signal) {
        console.log('worker ' + worker.process.pid + ' died');
    });
} else {
    // Workers can share any TCP connection
    // In this case its a HTTP server
    http.createServer(function (req, res) {
        res.writeHead(200);
        res.end("hello world\n");
    }).listen(8000);
}
Copy the code

Working Principle of Cluster

The cluster module is a functional encapsulation of the child_process and NET modules. When cluster starts, it starts the TCP server internally (only one TCP service can be started). When cluster.fork() child process, Send the TCP server socket file descriptor to the worker process. If the process is copied from cluster.fork(), then NODE_UNIQUE_ID will be in the environment variable. If the worker process listens for network port calls, then NODE_UNIQUE_ID will be in the environment variable. It will take the file descriptor and reuse it through the SO_REUSEADDR port, enabling multiple sub-processes to share the port. For processes that are started normally, there is no such thing as file descriptors passing shares. In the Cluster module, a main process can manage only one set of worker processes:

In contrast to child_process, the program can control multiple groups of worker processes at the same time, because we can create multiple groups of TCP services, so that child processes can share multiple server sockets.

Cluster events

You can also see event encapsulation for the child_process module

The event instructions
fork This event is triggered after a worker process is copied
online After copying a worker process, the worker process sends an online message to the main process. After receiving the message, the main process triggers the event
listening After the worker process calls LISTEN (), that is, after the server socket is shared, a listening message is sent to the main process. After the main process receives the message, the event is triggered
disconnect This event is triggered when the main and worker processes exit
exit This event is triggered when a worker process exits
setup This event is triggered after cluster.setupmaster () is executed

While we learned this, in a production environment, it is recommended to use a mature tool such as PM2 to manage the process. In addition, in addition to process management, monitoring the number of processes or monitoring logs is needed to ensure the stability of the entire system. If the main process fails to exit, the developer can be alerted to the failure in a timely manner.

test

Make sure the code you submit is testable, so it meets the following criteria:

  • Single responsibility
  • Interface abstraction
  • Level of separation

Unit testing

assertions

Node provides an assert module to implement assertions. Assertions, then, are a tool for checking whether a program meets expectations at run time. Let’s look at the example code for the assertion:

var assert = require('assert');
assert.equal(Math.max(1.100), 100);
Copy the code

Once assertion. equal does not meet expectations, an AssertionError is thrown and the entire program stops executing.

The assertion specification provides several testing methods:

methods instructions
ok Determine if the result is true
equal Determine whether the actual value is equal to the expected value
notEqual Determine if the actual value is not equal to the expected value
deepEqual Determines whether the actual value is equal to the expected depth, that is, whether the elements of an object or array are equal
notDeepEqual Determine if the actual value and expected value are not equal in depth
strictEqual Determine whether the actual value is strictly equal to the expected value, equivalent to ===
notStrictEqual Determine if the actual value is strictly equal to the expected value. = =
throws Determines whether the block throws an exception
doesNotThrow Determines if the block does not throw an exception
ifError Check whether the actual value is a false value (null, undefined, 0, “”, false), if the actual value is true, an exception will be thrown

The test framework

Test framework does not participate in testing, it is mainly used for managing test cases and generating test reports, and can improve the development speed of test cases to a certain extent, improve the maintainability and readability of test cases. The test framework I use here is mocha written by TJ and installed globally via NPM install mocha -g.

Test style

The main testing styles are TDD (Test Driven Development) and BDD (Behavior driven Development).

TDD style testing:

suite('Array'.function () {
    setup(function () {
        // ...
    });
    suite('#indexOf()'.function () {
        test('should return -1 when not present'.function () {
            assert.equal(-1[1.2.3].indexOf(4));
        });
    });
});
Copy the code

TDD mainly uses Suite and test to organize test cases. Suite implements multi-level description, and test cases use test. It provides setup and tearDown hook functions, which trigger execution when entering and exiting suite respectively. Let’s look at a TDD style organization diagram:

BDD style testing:

describe('Array'.function () {
    before(function () {
        // ...
    });
    describe('#indexOf()'.function () {
        it('should return -1 when not present'.function () {[1.2.3].indexOf(4).should.equal(-1);
        });
    });
});
Copy the code

BDD test cases are mainly organized by Describe and IT. Describe can describe a multi-level structure, and when it comes to test cases, it is used to express each test case. In addition, the BDD style provides before, After, beforeEach, and afterEach hook methods to assist in preparing, installing, uninstalling, and recycling test cases in describe. Before and After trigger execution on entering and exiting describe, respectively, and beforeEach and afterEach trigger execution before and after each test case (IT) in Describe, respectively. Let’s look at a BDD style organization diagram

The test report

Mocha’s design can use native Assert implementations as concrete assertions, or it can use extended libraries such as Shoul.js, Expect, Chai, etc., but regardless of the form of assertions, running test case users, test reports are what developers and quality managers are most concerned about. Mocha can generate test reports. Run the mocha –reporters command to view them

The test case

Complete functionality requires complete, multifaceted test cases, each containing at least one assertion. Let’s look at the code:

describe('#indexOf()'.function () {
    it('should return -1 when not present'.function () {[1.2.3].indexOf(4).should.equal(-1);
    });
    it('should return index when present'.function () {[1.2.3].indexOf(1).should.equal(0);
        [1.2.3].indexOf(2).should.equal(1);
        [1.2.3].indexOf(3).should.equal(2);
    });
});
Copy the code
Asynchronous test

To solve the problem of asynchronous testing with Mocha, let’s look at the code:

it('fs.readFile should be ok'.function (done) {
    fs.readFile('file_path'.'utf-8'.function (err, data) {
        should.not.exist(err);
        done();
    });
});
Copy the code
timeout

The default mocha timeout period is 2000 milliseconds. You can set the timeout period for all use cases by mocha -t. For more granular Settings, you can call this.timeout(ms) in test case IT to achieve a special setting for a single use case.

describe('a suite of tests'.function () {
    this.timeout(500);
    it('should take less than 500ms'.function (done) {
        setTimeout(done, 300);
    });
    it('should take less than 500ms as well'.function (done) {
        setTimeout(done, 200);
    });
});
Copy the code

Test coverage

By continuously adding test cases to the code, it will continue to cover branches and different situations of the code. We describe this metric using test coverage, which is either overall coverage or specific to specific lines. Let’s look at this code:

exports.parseAsync = function (input, callback) {
    setTimeout(function () {
        var result;
        try {
            result = JSON.parse(input);
        } catch (e) {
            return callback(e);
        }
        callback(null, result);
    }, 10);
};
Copy the code

We added a test section to it

describe('parseAsync'.function () {
    it('parseAsync should ok'.function (done) {
        lib.parseAsync('{"name": "JacksonTian"}'.function (err, data) {
            should.not.exist(err);
            data.name.should.be.equal('JacksonTian');
            done();
        });
    });
});
Copy the code

mock

Use may occur because, all kinds of exceptions, are not necessarily we can think of in the test, such as database connection fails, could be caused by abnormal network, even can also be caused by administrator to change the password, due to the abnormal simulation is not very easy, therefore, the scientists gave a special abnormal noun: a mock, We test the robustness of the upper level code by forgery being called, etc.

exports.getContent = function (filename) {
    try {
        return fs.readFileSync(filename, 'utf-8');
    } catch (e) {
        return ' '; }};Copy the code

To solve this problem, we raise an exception by faking the fs.readFilesync () method to throw an error. In addition, to ensure that the test case does not affect the rest of the use cases, we need to restore it after execution. For this, we use the previously mentioned before and after:

describe("getContent".function () {
    var _readFileSync;
    before(function () {
        _readFileSync = fs.readFileSync;
        fs.readFileSync = function (filename, encoding) {
            throw new Error("mock readFileSync error"));
    };
});
// it();
after(function () { fs.readFileSync = _readFileSync; })});Copy the code

Testing of private methods

It is also important to test the private methods in the module that are not referenced by exports. We can use rewire to test private modules, that is, to reference modules using rewire

var limit = function (num) {
    return num < 0 ? 0 : num;
};

// Test case
it('limit should return success'.function () {
    var lib = rewire('.. /lib/index.js');
    var litmit = lib.__get__('limit');
    litmit(10).should.be.equal(10);
});
Copy the code

The rewire module introduces the same parameters to the original file as require:

(function(exports.require.module, __filename, __dirname) {֖})Copy the code

In addition, he will inject other code:

(function (exports.require.module, __filename, __dirname) {
    var method = function () {};exports.__set__ = function (name, value) {
        eval(name "=" value.toString());
    };
    exports.__get__ = function (name) {
        return eval(name);
    };
});
Copy the code

Every module that is rewired has set() and get() methods. This is a clever use of the closure principle. When eval() is executed, local variables are accessed inside the module, so that local variables can be exported to the test case for call execution.

Test engineering and test automation

We reduce manual costs through continuous integration.

engineering

Under Linux, it is recommended to use makefiles to build projects

TESTS = test /*.js
REPORTER = spec
TIMEOUT = 10000
MOCHA_OPTS =
    test:
@NODE_ENV=test./ node_modules / mocha / bin / mocha \
--reporter $(REPORTER) \
--timeout $(TIMEOUT) \
$(MOCHA_OPTS) \
$(TESTS)
test - cov:
@ $(MAKE) test MOCHA_OPTS = '--require blanket' REPORTER = html - cov > coverage.html
test - all: test test - cov
    .PHONY: test
Copy the code

Developers can perform complex unit tests and coverage by simply passing make Test and Make test-cov. Makefile indent: TAB = “blanket” = “blanket”;

Continuous integration

A popular approach in the community — continuous integration using Travis – CI.

The performance test

Performance testing includes load test, stress test, benchmark test, Web application network level performance test, and service indicator conversion.

The benchmark

A benchmark measures how many times a method has been executed in what amount of time. It usually uses the count as a reference, and then compares The Times to determine performance gaps.

Pressure test

The throughput rate, response time, and concurrency are several indicators that need to be investigated for the network interface stress test. These indicators reflect the concurrent processing capability of the server. Stress tests can be performed using AB, siege, HTTP_LOAD, etc.

Benchmark driven development

Felix Geisendorfer, an early contributor to node code who developed several mysql drivers with a reputation for performance, mentioned one development model he used in the Faster Than C slide, Benchmark Driven Development (BDD)

Transformation of test data to business data

Usually, before the function of the actual development, we need to evaluate portfolio, in order to function after the completion of the development, to be able to do the actual online business, if only a few users, a day of pv only a few dozen, then the website development will fit almost don’t need any optimization, if pv in 100000, millions, even thousands, It is necessary to use performance testing to verify whether the actual business needs can be met, if not, it is necessary to use various optimization methods to improve the service capability.

The transition

Early exposure to Node has many benefits. First, since Node is relatively young compared to many Web technologies, developers can be exposed to many low-level details, such as HTTP protocol, process model, service model, and so on. These underlying principles are not substantially different from other existing technologies. Since the ecology of Node is not yet mature, in the development of actual products, a lot of non-coding related work is needed to ensure the progress of the project and the normal operation of the products, including engineering, architecture, disaster recovery and backup, deployment, operation and maintenance, etc.

Project engineering

Project engineering is the ability to organize a project, including directory structure, build tools, coding specifications, and code reviews.

Deployment process

After the code is developed, reviewed, and merged, it enters the deployment process.

The deployment environment

There are several environments from the development of a project to the official release, the first is the development environment, and then the test environment, also called the stage environment. Next comes the pre-release environment, also known as the pre-release environment, and finally the production environment, also known as the Product environment. The deployment process is as follows:

Deployment operations

To deploy, you need to start a long-running service process. Therefore, you need to run nohup and & commands without hanging up the process: nohup node app.js &. Also consider project stoppage and project restart. So you need to write a bash script to simplify the operation. The contents of the bash script are implemented by convention with the Web application, which solves the problem that the process ID is not easy to find.

performance

There are many ways to improve the performance of web applications, such as dynamic separation, multi-process architecture, distributed, but they all need to be split, so let’s start with the split principle: 1. 2. Let the tools you are good at do what you are good at 3. Simplify the model 4. Separate risks

Dynamic and static separation

Node can do this through middleware, but again, let the tools that are good at it do what they are good at. Therefore, direct static files such as images, scripts, stylesheets, and multimedia to a professional static file server and let Node handle only dynamic requests. This process can be handled using nginx or using CDN.

Enable the cache

There are almost two ways to improve performance. One is to speed up the service, and the other is to avoid unnecessary computation. The most common scenario for avoiding unnecessary computation is the use of caching. It is now common practice to use Redis as a cache. The static content or unchanged content queried from the database is stored through Redis. When the same request comes next time, it will give priority to check whether there is data in the cache. If there is, it will hit the data in the cache.

Multiprocess architecture

Using the multi-process architecture makes full use of the CPU, and because nodes do not need additional containers to use HTTP services (based on HTTP modules), the developers need to handle the management of the multi-process, or use the official cluster module. Or modules such as PM, Forever, pM2 for process management.

Reading and writing separation

Read/write separation is mainly for the database operation read/write separation, read speed is much faster than write speed. (Because the write needs to lock the table to protect data consistency), the separation of read and write requires the master-slave design of the database. However, our company did not have special operation and maintenance personnel, so we used The RDS of Ali to realize the separation of read and write.

The log

In order to establish a sound investigation and tracking mechanism, it is necessary to add a log to the system. A sound log can best restore the scene of the problem, as if the detective solved the case first hand clues.

Access log

Access logs record the access of each client to the application.

Abnormal log

It is used to record unexpected exception errors.

Log and database

Logs are written online, and logs are analyzed and synchronized to the database through some files.

Split the log

It can be split by date or by log type (_stdout and _stderr).

Monitoring alarm

New applications need monitoring in two aspects: business logic monitoring and hardware monitoring. So let’s see how to do that.

monitoring

1. Log monitoring

Look at specific business implementation, for example, by analyzing the log time, to reflect a business QPS, at the same time, also can query to the pv on the log (daily IP access or refresh) and uv (a client access to the number of times daily, don’t repeat calculation), can through the pv and uv know very well the user’s habits, to predict access peak, etc.

2. Response time

A healthy system response time fluctuates less and is consistently balanced.

3. Process monitoring

Check the number of application (work) processes running in the operating system and raise an alarm if they are below a certain estimate.

4. Monitor disks

Monitor disk usage to prevent system problems due to insufficient disk space. Once the disk usage exceeds the warning value, the server manager should clean up the log or clean up the disk.

5. Monitor memory

Check for memory leaks. If memory only goes up and not down, then it must be a memory leak. Healthy memory should go up and down.

If the process of memory leaks, and not trying to solve, there is a solution to solve this kind of situation, this scheme applied to the service of multi-process architecture cluster, make each work process how many times specified service request, to process after requests will no longer serve new HOME LINK, main process to start the new work process to customer service, The old process exits when all connections are lost.

6. Monitor the CPU usage

CPU usage can be classified into user mode, kernel mode, and IOWait mode. A high CPU usage in user mode indicates that applications on the server require a large amount of CPU overhead. A high CPU usage in kernel mode indicates that the server spends a large amount of time scheduling processes or making system calls. IOWait usage indicates that the CPU is waiting for disk I/O operations.

If user mode is less than 70%, kernel mode is less than 35%, and the overall value is less than 70%, the CPU is in healthy state.

7. The CPU load monitoring

CPU load, also known as the AVERAGE CPU load, describes the current busyness of an OS. It can be simply interpreted as the average number of CPU tasks in use or waiting to be used per unit time. It has three metrics: 1 minute load average, 5 minute load average, and 15 minute load average. A high CPU load indicates that there are too many processes. In a node, this may occur by repeatedly starting new processes using child process modules.

8. IO load

IO load. This is mainly about disk IO. For node, this type of IO load is mainly caused by database IO.

9. Network monitoring

Mainly monitor network traffic, this value can see whether the company’s relevant publicity is effective, whether the advertising is effective, whether to increase the access traffic. (Monitors incoming and outgoing traffic)

10. Monitor application status

This monitoring can be done by adding a timestamp:

app.use('/status'.function (req, res) {
    res.writeHead(200);
    res.end(new Date());
})
Copy the code

At the same time, for business related content also need to be printed as much as possible.

11. The DNS monitoring

It can be detected based on third-party software, such as DNSPod, etc. We use Aliyun’S DNS.

Realization of alarm

With monitoring, an alarm system must be provided. In general, alarm system: mail alarm, IM alarm, SMS alarm, telephone alarm.

Node service stability

A single server can’t keep up with the growing demand, so nodes need to be deployed on multiple machines in a multi-process manner, so that if one machine fails, the rest of the machines will continue to serve users. In addition, large enterprises also perform remote disaster recovery and build nearby servers. This offsets some of the network latency associated with location. For better stability, the typical way of horizontal expansion is multi-process, multi-machine, multi-machine room, such a distributed design is not uncommon in today’s Internet companies.

Many machines

More room

Disaster backup

Coexistence of heterogeneous

Node is amazing, but any amazing node functionality is supported by the underlying functionality of the operating system. Therefore, the heterogeneous coexistence of nodes is very simple and common.

The resources

  • “Nodejs” — after reading summary
  • Baixiyue -NodeJS related notes