preface

When it comes to building high-throughput Web applications, NginX and Node.js are a perfect match. They are based on an event-driven model and can easily overcome the C10K bottleneck of traditional Web servers such as Apache. The default configuration already allows for high concurrency, but there is still some work to be done if you want thousands of requests per second or more on cheap hardware.

This article assumes that readers use NginX’s HttpProxyModule to act as a reverse proxy for upstream Node.js servers. We’ll cover tuning Sysctl for Ubuntu 10.04 and above, and tuning Node.js applications with NginX. Of course, if you’re using Debian, you can achieve the same goal, just tune it differently.

Network tuning

Without understanding the underlying transport mechanisms of Nginx and Node.js first and optimizing them accordingly, fine-tuning both may be futile. Typically, Nginx uses TCP sockets to connect clients to upstream applications.

Our system has many thresholds and limits for TCP, set by kernel parameters. The default values of these parameters are often intended for general use and do not meet the high traffic and short life requirements required by web servers.

Some of the parameters available for tuning TCP are listed here. For them to work, you can put them in a /etc/sysctl.conf file, or in a new configuration file, such as /etc/sysctl.d/99-tunning. conf, and then run sysctl -p to have the kernel load them. We use sySCtL-Cookbook to do this manual work.

It is important to note that the values listed here are safe to use, but it is recommended that you research the meaning of each parameter so that you can choose a more appropriate value based on your load, hardware, and usage.

Highlight a few of the most important ones.

net.ipv4.ip_local_port_range

To serve the downstream client for the upstream application, NginX must open two TCP connections, one to the client and one to the application. When the server receives many connections, the system will quickly run out of available ports. You can change the range of available ports by modifying net.ipv4.ip_local_port_range. If possible SYN flooding on port 80. Sending cookies is displayed in /var/log/syslog, the system cannot find the available port. Increasing the net.ipv4.ip_local_port_range parameter can reduce this error.

net.ipv4.tcp_tw_reuse

When a server needs to switch between a large number of TCP connections, a large number of connections in TIME_WAIT state are generated. TIME_WAIT means that the connection itself is closed, but the resource has not been freed. Setting net_ipv4_tcp_tw_reuse to 1 allows the kernel to reclaim connections when it is safe, which is much cheaper than creating new ones.

net.ipv4.tcp_fin_timeout

This is the minimum time that a connection in TIME_WAIT state must wait before being recycled. Make it smaller to speed up recycling.

How to check connection status using netstat:

netstat -tan | awk ‘{print $6}’ | sort | uniq -c

Or use the ss:

ss -s

NginX

As the load on the Web server increases, we begin to experience some strange limitations of NginX. The connection was discarded, and the kernel continuously reported SYN flood packets. Procedure At this point, the average load and CPU usage are low, and the server could handle more connections, which is frustrating.

After investigation, there are many connections in TIME_WAIT state. Here’s the output from one of the servers:

47,135 TIME_WAIT connections! Also, as you can see from SS, they are all closed connections. This indicates that the server has consumed most of the available ports, and implies that the server is assigning new ports to each connection. Tuning the network helps a little, but there are still not enough ports.

After further research, I found a document about the Keepalive instruction for upstream connections, which reads:

Set the maximum number of idle live connections to the upstream server. These connections are kept in the worker process cache. Interesting. In theory, this setup minimizes connection waste by passing requests over cached connections. The documentation also mentions that we should set proxy_http_version to “1.1” and clear the “Connection” header. Upon further research, I found this to be a good idea, as HTTP/1.1 greatly optimizes TCP connection usage compared to HTTP1.0, which Nginx uses by default.

After the recommended changes in the documentation, our uplink configuration file looks like this:

I also changed the proxy Settings in the Server section as it suggested. Also, a proxy_next_upstream was added to skip the failed server, keepalive_timeout was adjusted for the client, and access logging was turned off. The configuration looks like this:

With the new configuration, I saw a 90% reduction in socket usage. Requests can now be transmitted with far fewer connections. The new output is as follows:

Node.js

Thanks to the event-driven design that handles I/O asynchronously, Node.js handles a large number of connections and requests right out of the box. There are other tuning options, but this article will focus on the process aspect of Node.js.

Node is single-threaded and does not automatically use multiple cores. That is, the application does not automatically acquire the full power of the server.

Cluster the Node process

We can modify the application to fork multiple threads to receive data on the same port, allowing the load to span multiple cores. Node has a Cluster module that provides all the tools necessary to achieve this goal, but adding them to the application requires a lot of manual labor. If you use Express, eBay has a module called Cluster2.

Preventing context switching

When running multiple processes, you should ensure that each CPU core is busy with only one process at a time. In general, if the CPU has N cores, we should generate n-1 application processes. This ensures that each process gets a reasonable slice of time, leaving one core for the kernel scheduler to run other tasks. We also want to make sure that few tasks other than Node.js are performed on the server to prevent CPU contention.

We once made the mistake of deploying two Node.js applications on the server, and then opening n-1 processes for each application. As a result, they compete with each other for cpus, causing the system load to skyrocket. Even though our servers are 8-core machines, the performance overhead caused by context switching is still very noticeable. Context switching is when the CPU suspends the current task in order to perform another task. During a switch, the kernel must suspend all states of the current process and then load and execute another process. To solve this problem, we reduced the number of processes each application started so that they shared the CPU fairly, resulting in lower system load:

Notice how the system load (blue line) drops below the number of CPU cores (red line). On other servers, we see the same thing. Since the total amount of work remains the same, the performance improvement in the figure above can only be attributed to fewer context switches.