preface

When We first saw Node.js, we knew it was made up of these keywords: event-driven, non-blocking I/O, efficient, lightweight, and that’s how it describes itself on its website.

Node.js® is a JavaScript runtime built on Chrome’s V8 JavaScript engine. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient.

Copy the code

Thus, the following scenario occurs:

When we were first introduced to it, we might wonder:

  • Why can Javascript running in a browser interact with the operating system at such a low level?

When we use it for file /O and network I/O, we find that both methods need to pass in callbacks, which are asynchronous:

  • So how is this asynchronous, non-blocking I/O implemented?

When we get used to using Callback to handle I/O and find that Callback Hell comes up when sequential processing is required, we come up with a synchronous approach:

  • Is there a synchronous method in node.js that is predominantly asynchronous?

As a front-end, when you use it, you’ll notice that its asynchronous processing is event-based, similar to its predecessors:

  • So how does it implement this event-driven approach?

As we write more slowly and process a lot of I/O requests, you think:

  • Node.js asynchronous non-blocking I/O will not have bottlenecks? And then you think,

  • As awesome as Node.js is, isn’t there anything it doesn’t work for? Wait…

See these questions, whether there is a big nod, don’t worry, with these questions we slowly read this article.

Node. Js structure

So let’s start with Node.js itself and take a look at the structure of Node.js.

  • The Node.js standard library, which is written in Javascript, is the API that we can call directly when we use it. You can see this in the lib directory of the source code.
  • Node Bindings, this layer is the key to the communication between Javascript and the underlying C/C++. In the node. Cc
  • This layer is the key to running Node.js and is implemented by C/C++.
  • V8: The Google Javascript VM is also the key to why Node.js uses Javascript. It provides an environment for Javascript to run in non-browser side, and its efficiency is one of the reasons why Node.js is so effective.
  • Libuv: It provides Node.js with cross-platform, thread pooling, event pooling, and asynchronous I/O capabilities that make Node.js so powerful.
  • C-ares: Provides asynchronous dnS-related processing capabilities.
  • Http_parser, OpenSSL, zlib, etc. : provides HTTP parsing, SSL, data compression, and other capabilities.

Libuv

Libuv is a key component of Node.js, providing a unified API call for the top layer of Node.js, allowing it to ignore platform gaps and hide the underlying implementation.

Here’s a good picture of what it can do:

We just need to know that it is asynchronous and event-driven, and with that in mind, the following questions will be answered.

Interacting with the operating system

For a simple example, if we want to open a file and perform some operations, we can write the following code:

var fs = require('fs');
fs.open('./test.txt'."w".function(err, fd) { //.. do something });Copy the code

Lib /fs.js → SRC /node_file.cc → uv_fs

Node.js:

In general, methods we call in Javascript will eventually be passed to the C/C++ layer via process.binding, and they will eventually perform the actual operations. This is how Node.js interacts with the operating system.

Through this process, we can see that Node.js, while said to use Javascript, only uses Javascript syntax to write programs during development. The actual execution is still interpreted by V8 Javascript, and the actual system calls are performed by C/C++, so you don’t need to worry too much about Javascript execution efficiency. Node.js is not a language. It’s a platform.

Asynchronous, non-blocking I/O

As we’ve seen above, it’s Libuv that actually makes system calls. As mentioned earlier, Libuv is asynchronous and event-driven, so when we pass the REQUEST for an I/O operation to Libuv, Libuv starts a thread to execute the I/O call, and when it’s done, passes it back to Javascript for further processing.

I/O includes file I/O and network I/O. The underlying execution is slightly different. File I/O, DNS and other operations are implemented by Thread pools. Network I/O, including TCP, UDP, TTY, etc. is implemented by epoll, IOCP, and Kqueue.

In summary, the general flow of an asynchronous I/O is as follows:

  • Initiate an I/O call

  • The user invokes the Node core module through Javascript code, passing arguments and callback functions into the core module.

  • The Node core module wraps the passed arguments and callback functions into a request object.

  • Push the request object into the I/O thread pool for execution;

  • The asynchronous Javascript call ends, and the Javascript thread continues to perform subsequent operations.

  • Implement the callback

  • When the I/O operation completes, the result is stored in the result property of the request object and the operation is notified of completion.

  • Each event loop checks for completed I/O operations, and if so, adds the request object to the I/O observer queue, which is then processed as an event.

  • When the I/O observer event is processed, the callback function that was previously wrapped in the request object is taken and executed, taking Result as an argument to accomplish the purpose of the Javascript callback.

From here, we can see that we have a misunderstanding about single threading of Node.js. In fact, its single thread refers to the single thread of its own Javascript runtime environment. Node.js does not give Javascript the ability to create new threads while executing. The actual operation is performed through Libuv and its event loop. This is why Javascript, a single-threaded language, can perform asynchronous operations in Node.js, and the two do not conflict.

event-driven

Speaking of event driven, it’s no stranger to the front end. Events are a common concept in GUI development, such as mouse events, keyboard events, and so on. Among the many implementations of asynchrony, events are one of the easier ways to understand and implement asynchrony.

When we think of events, we always think of callbacks. How does Libuv perform these callbacks when we’ve written a bunch of event handlers? This brings us to the uv_run we talked about earlier. Let’s take a look at its execution flow diagram:

typedef struct uv_loop_s uv_loop_t;
typedef struct uv_err_s uv_err_t;
typedef struct uv_handle_s uv_handle_t;
typedef struct uv_stream_s uv_stream_t;
typedef struct uv_tcp_s uv_tcp_t;
typedef struct uv_udp_s uv_udp_t;
typedef struct uv_pipe_s uv_pipe_t;
typedef struct uv_tty_s uv_tty_t;
typedef struct uv_poll_s uv_poll_t;
typedef struct uv_timer_s uv_timer_t;
typedef struct uv_prepare_s uv_prepare_t;
typedef struct uv_check_s uv_check_t;
typedef struct uv_idle_s uv_idle_t;
typedef struct uv_async_s uv_async_t;
typedef struct uv_process_s uv_process_t;
typedef struct uv_fs_event_s uv_fs_event_t;
typedef struct uv_fs_poll_s uv_fs_poll_t;
typedef struct uv_signal_s uv_signal_t;

Copy the code

Each of these monitors has an asynchronous operation that registers event listeners and corresponding callbacks via uv_TYPE_start.

As uv_run executes, it checks the queue for pending events, and fires if any, and it only executes a single callback to avoid competing with multiple callbacks because Javascript is single-threaded and can’t handle this.

In the figure above, the event-driven I/O operations are expressed more clearly. In addition to the usual I/O operations, the picture also describes a situation called a timer. It differs from the other two in that it does not initiate a new thread separately, but does so directly in an event loop.

In addition to maintaining the observer queues, the event loop also maintains a time field, which is assigned a value of 0 at initialization and updated with each loop. All time-dependent operations are compared with this value to determine whether to execute them.

In the figure, the process associated with timer is as follows:

  • Update the time field of the current loop, that is, “now” under the current loop;
  • Handlers /requests check if there are any other tasks in the loop that need to be handled. If not, the loop is not needed, that is, alive.
  • Check the registered timer. If the time specified in a timer is behind the current time, it indicates that the timer has expired, and the corresponding callback function is executed.
  • Perform an I/O polling (that is, block the thread and wait for an I/O event to occur). If no I/O has completed by the time the next timer expires, stop waiting and perform the callback of the next timer. If an I/O event occurs, the corresponding callback is executed. Since another timer may have expired during the callback, check the timer again and execute the callback. Node.js will call uv_run until the loop is no longer alive.

Synchronized methods

While Node.js is primarily asynchronous, there will be some cases where async writing will result in an ugly Callback Hell, like this:

db.query('select nickname from users where id="12"'.function() {
	db.query('select * from xxx where id="12"'.function() {
		db.query('select * from xxx where id="12"'.function() {
			db.query('select * from xxx where id="12"'.function() {
				//...	
			});
		});
	});
});
Copy the code

If there is a synchronization method at this point, it will be much more convenient. Node.js developers have also realized that most asynchronous functions currently have their corresponding synchronous versions. You only need to add Sync after their names without passing in a callback.

var file = fs.readFileSync('/test.txt', {"encoding": "utf-8});
Copy the code

This writing method is relatively easy to use, execute shell commands, read files and so on are more convenient. Instead of passing an error message in the first argument, as callback functions do, it throws the error directly. Instead, you need to use try… Catch, as follows:

var data;
try {
  data = fs.readFileSync('/test.txt');
} catch (e) {
	if (e.code == 'ENOENT'{/ /... } / /... }Copy the code

How these methods are implemented will be discussed next time.

Some possible bottlenecks

Here only to discuss their own understanding, welcome to correct.

First, the I/O aspect of files, the running of user code, notification of event loops, etc., is operated through a thread pool maintained by Libuv, which runs all file system operations. In this case, regardless of the impact of the hard disk, for serious C/C++, this thread pool must be limited in size. The official default size given is 4. Of course it can be changed. Change this value at startup by setting UV_THREADPOOL_SIZE. However, the maximum is 128 because of the memory footprint involved.

This thread pool is shared for all event loops. When a function uses a thread pool (such as calling uv_queue_work), Libuv preallocates and initializes the number of threads allowed by UV_THREADPOOL_SIZE. The memory footprint of 128 is about 1MB, and if set too high, thread performance will be reduced due to excessive memory usage when thread pools are used frequently. To specify;

As for network I/O, in Linux system, network I/O adopts the asynchronous model epoll. It has the advantage of using event callbacks, greatly reducing the creation of file descriptors (everything is a file under Linux).

Each time epoll_wait is called, the number of ready descriptors is actually returned. According to this value, the number of ready descriptors is fetched from the array specified by epoll_wait. This is a memory-mapped method that reduces the copying overhead of file descriptors.

Epoll create specifies an array whose size can be the size of the number of listeners. It has different default values on different systems.

Given the size limit, it is not enough to ensure stable operation and prevent memory leaks when you call the epoll function. Another value, maxEvents, is the maximum number that epoll_WAIT can handle and can be specified when epoll_WAIT is called. It is generally smaller than the size of the array when it was created (epoll_create). Of course, you can set it to be larger than size, but this should not work. As you can imagine, if there are more ready events than maxEvents, then the exceeded events will have to wait for the previous event processing to complete before they can continue, which may lead to a decrease in efficiency.

In this case, you might worry about the event being lost. In fact, it is not lost, it saves these events in a queue via ep_collect_ready_items and notifies them at the next epoll_wait.

What is Node.js not good for

While it may seem, Node.js can do a lot of things, and has very high performance. Node.js is best suited for I/O intensive applications such as chat rooms, blogs, etc.

However, there is one type of application that Node.js may struggle to handle, and that is CPU-intensive applications. As mentioned earlier, Libuv handles asynchronous events through event loops, a mechanism that exists in the main thread of Node.js. With this mechanism, all I/O operations and calls to the underlying API become asynchronous. But the user’s Javascript code runs on the main thread, and if it takes too long to run, the event loop will block. Because it processes events in queue order, if any of the transactions/events themselves do not complete, the other callbacks, listeners, timeouts, and nextTick() do not get a chance to run and the blocked event loop does not have a chance to process them. At best, efficiency will decline, at worst, operation will stagnate.

For example, common template rendering, compression, decompression, add/decrypt and other operations are the weakness of Node.js, so take this into account when using it.

conclusion

  • Node.js handles the interaction with the operating system through Libuv and is thus asynchronous, non-blocking, event-driven.
  • Node.js is actually a single thread of Javascript execution thread, the real I/O operations, the underlying API calls are performed through multiple threads.
  • CPU intensive tasks are the Achilles heel of Node.js.

Address: reprinted fed.taobao.org/blog/2015/1…