Moment For Technology

Optimizing network performance from a business perspective | more challenges in August

Posted on Dec. 3, 2022, 9:11 a.m. by Andrew Wheeler
Category: The back-end Tag: The back-end

This is the fifth day of my participation in the August More text Challenge. For details, see: August More Text Challenge.

Anyone who knows anything about a computer knows that the computer's operating system manages its resources, including network equipment. Networking devices are the neurons of the Internet, so to speak. Computers rely on them to communicate with each other. If the network communication performance is very poor, it will lead to the processing power of the computer can not be fully utilized, and then affect the software system we want to design, so it is very important to optimize the network performance.

So how can we improve network performance without increasing the cost of network equipment? The answer is that it can be improved by optimizing system parameters.

Why is that? Because the network protocol is handled by the operating system, and the operating system has many network protocol related parameters, if these parameters are configured incorrectly, will undoubtedly degrade the network performance.

Take seckill system as an example, seckill each commodity picture is usually stored in the picture storage system. Although these images are usually cached on the front end to speed up image loading, the first time the browser accesses them, it still needs to get the images from the image storage system. If a single picture several MB or even dozens of MB, once too much, will undoubtedly affect the performance of the picture storage system. So for the image storage system, we can optimize the parameters of the network connection buffer to improve the performance.

Another example is the seckill interface, which needs to process a large number of seckill requests because of the high user participation. If you want to speed up the requests, you can optimize the parameters related to the network connection and the request for fast processing.

What are the system parameters?

We know that the Linux system has more than 1,000 parameters, so many parameters correspond to different functions, how do we choose the appropriate parameters to optimize?

First, we can find all the system parameters and run the command on the Linux system:

sudo /sbin/sysctl -a
Copy the code

The results are shown in the figure below:

There are a lot of system parameters, and I'm just taking some of them here.

Generally speaking, different parameter types correspond to different resources and performance, which affects different service scenarios. For example, network parameters mainly affect network request services, and file system parameters mainly affect file read/write service scenarios.

To get the parameter type, run the following command:

sudo /sbin/sysctl -a|awk -F "." '{print $1}'|sort -k1|uniq
Copy the code

The results are as follows:







Copy the code

The NET type is the one we need to focus on because it contains almost all the parameters related to network performance optimization. By executing the following command:

sudo /sbin/sysctl -a|grep "^net."|awk -F "[.| ]" '{print $2}'|sort -k1|uniq
Copy the code

We will get all subtypes under the NET type:







Copy the code

On Linux, these parameters can be changed in the /etc/sysctl.conf file. If not, you can add them yourself. After the modification, run the sudo sysctl -p command to load the latest configuration for the modification to take effect.

Among these subtypes, we need to focus on the configuration of core and ipv4. These two types of configurations contain various parameters related to network protocols, such as buffer memory size parameters, rapid resource recovery parameters and so on. The two business scenarios I mentioned earlier, the Seckill interface service and the image storage system, require the parameters in core and ipv4 to optimize.

Next, let's look at how to optimize key network socket buffer, TCP protocol, maximum number of connections and other parameters in Core and ipv4.

How do I tune the socket buffer parameters

In an operating system, system resources for network communication are allocated and managed on the basis of network sockets. Before the advent of program-controlled telephone, people made telephone calls by connecting the telephone lines of the two parties to each other through an operator. Since the advent of computer network communication, we have used sockets to represent the endpoints of two-way network communication. The size of the network socket buffer directly affects the performance of the program sending and receiving data over the network.

Like the image storage system we mentioned earlier, it mainly deals with image file data. If the network socket buffer is too small, the program may have to read and write multiple times before the data is processed, which can significantly affect the performance of the program. If the buffer is large enough, the program can quickly write the processed data to the buffer and then move on to other things, which improves performance.

So how do you set the size of the socket buffer? It is set by system parameters. The specific parameters can be obtained by running the following command on the terminal.

sudo /sbin/sysctl -a|grep "^net."|grep "[r|w|_]mem[_| ]"
Copy the code

Here are the results:

In the result above, the left side of the equals sign is the parameter name and the right side is the parameter value. From the parameter names on the left side of the equals sign, we can roughly see several keywords: mem, rmem, wmem, Max, default, min.

Max, default, and min are the maximum, default, and minimum values respectively. Mem, RMEM, and WMEM are total memory, receive buffer memory, and send buffer memory respectively.

In the memory parameters above, rmem and wmem are in bytes and MEm is in pages. A page is the smallest unit of memory managed by the operating system. On Linux, the default size of a page is 4KB.

Also, did you notice that tcp_mem, tcp_rmem, tcp_wmem, and udp_mem have three values after them? For tcp_rmem and tcp_wmem, these three values are the size of memory that can be allocated for a single socket. From left to right, they are the minimum, default, and maximum values. For example, for tcp_rmem, the minimum is 4096, the default is 131072, and the maximum is 6291456.

Note that the default and maximum values are overridden by the corresponding default and Max values under net.core, respectively. For example, the default value of tcp_rmem is overridden by rmem_default, and the maximum value is overridden by rmem_max.

For tcp_mem and UDP_mem, the following three values are used to control the memory pressure, from left to right are the minimum memory pressure, the pressure value, and the maximum memory pressure, for example tcp_mem is 188964, 251954, and 377928. When the total TCP memory usage is less than 188964, there is no pressure on the memory and reclamation is not required. When the memory usage exceeds 251954, the system starts to reclaim the memory until it is less than 188964. When the memory usage reaches 377928, the system will reject the socket allocation and print the log "TCP: Too many of orphaned Sockets".

So how do we optimize these parameters? Consider a business scenario.

Seckill system management background has a function of uploading commodity pictures, used to upload pictures from the front end to the back end file storage system. Commodity pictures as small as hundreds of KB, large is up to a dozen MB. It usually takes several seconds or even more than ten seconds to upload a picture. If the picture is uploaded in batches, the whole uploading process may take dozens of seconds. How to optimize network parameters to improve server processing performance in this file uploading scenario?

The file upload system is mainly responsible for processing file data, it does not need to deal with the establishment and disconnection requests frequently, just needs to send and receive large amounts of data as soon as possible. But since each packet is relatively large, we need to allocate a large amount of memory for each socket on the file upload system.

How do you do that? As we know, the file upload system provides an HTTP interface to the front end and uses the HTTP protocol, while the underlying LAYER of THE HTTP protocol transmits data based on TCP connection. Therefore, to improve the performance of the system processing file data, we can modify the following parameters:







Copy the code

For example, the system can allocate a maximum of 2GB memory for TCP, a minimum of 256MB memory, and a pressure of 1.5GB memory. If a page is 4KB, the minimum value, pressure value, and maximum value of tcp_mem are 65536, 393216, and 524288, respectively, in pages.

If the average number of packets per file is 512KB, each socket read/write buffer can hold a minimum of 2 packets each, the default can hold 4 packets each, and the maximum can hold 10 packets each. The minimum, default and maximum values for tcp_rmem and tcp_wmem are 1048576, 2097152 and 5242880, respectively, in bytes. Rmem_default and wmem_default are 2097152, and rmem_max and wmem_max are 5242880.

In addition, since the buffer exceeds 65535, you also need to set the net.ipv4. tcp_WINDOW_scaling parameter to 1 to inform the system to use large TCP buffers.

Finally, our parameter configuration is as follows:

net.core.rmem_default = 2097152

net.core.rmem_max = 5242880

net.core.wmem_default = 2097152

net.core.wmem_max = 5242880

net.ipv4.tcp_mem = 65536  393216  524288

net.ipv4.tcp_rmem = 1048576  2097152  5242880

net.ipv4.tcp_wmem = 1048576  2097152  5242880
Copy the code

It is important to note that different service scenarios have different sizes of connections and different buffer configurations. In the case of a large number of short connections such as seckill interfaces, you need to reduce the value of RMEM and WMEM. For example, by changing the minimum, default, and maximum values to 4096, 4096, and 8192, more connections can be established.

This is the network buffer parameter configuration, next I will introduce the TCP protocol related parameters.

How do I optimize TCP parameters and the maximum number of connections

Those familiar with THE TCP protocol should be familiar with mechanisms such as "three-way handshake", "four-way wave", "slow start", "sliding window", "timeout retransmission" and "sticky packet algorithm", which ensure the reliable transmission of TCP. Sometimes, however, these mechanisms can become network bottlenecks.

For example, when the network bandwidth is very good, the "slow start" mechanism can actually limit the data transmission speed. For example, the "packet sticking algorithm" will combine a few small packets into a TCP packet and send it, or wait until the timer expires. In some cases, the algorithm can improve the network throughput, but for some data requiring high real-time, it will lead to the receiver can not receive the data in time.

So how do we optimize the parameters of these mechanics? Let's look at a business scenario for the Seckill interface service.

Seckill snap up interface, responsible for receiving a large number of users snap up requests. For qualified users, execute the inventory deduction and order and then return to snap up successful to the front end; For users who are not eligible or fail to deduct inventory, the snap failure is returned to the front end. With about 500 bytes of HTTP protocol data coming back from the snap request, the Seckill interface service needs to quickly process the establishment, disconnection, and recovery of connections. So, how can we optimize the network parameters of seckill interface service to improve the performance of handling network connections?

First of all, we know that seckill interface service provides services to users through the public network, and the data frame of the public network can carry 1472 bytes of application data. The data processed by seckill interface is 500 bytes, which is equivalent to 1/3 of 1472. Therefore, it can be judged that these packets are small and may be affected by the "sticking-packet algorithm".

However, users are sensitive to the time taken by requests, which requires turning off the "sticking-packet algorithm" to ensure that packets are delivered immediately. How do you do that? The algorithm can be turned off by adding the TCP_NODELAY parameter to the TCP socket. In addition, the net.ipv4.tcp_syncookies parameter needs to be set to 1 to defend against the short packet attack initiated by an attacker with a large number of SYN packets.

Second, the number of seckill users is large, which requires the seckill interface service to handle a large number of short connections. What will happen? You need the seckill interface service to create and reclaim sockets very quickly in order to have enough resources to handle a large number of connections. How to do? We can quickly reclaim or reuse allocated resources by closing idle connections, multiplexing sockets, and so on. The specific Settings are as follows:

Tcp_tw_reuse = 1 # Reuse a socket in TIME WAIT net.ipv4. tcp_TW_RECYCLE = 1 # Close a socket in FIN-WAit-2 Net.ipv4. tcp_fin_timeout = 30 # Set idle TCP connection lifetime to close idle connections immediately and recycle resources net.ipv4. tcp_Keepalive_time =1800Copy the code

Third, due to the large number of kill requests, occasional network jitter may cause some packets to be lost, which will trigger "timeout retransmission". To avoid retransmission of all packets after network jitter, you can set the parameters of selective retransmission to avoid retransmission of successfully sent packets and waste of network bandwidth. To do this, set the net.ipv4. tcp_SACK parameter to 1.

Finally, with millions of users living every day in seckill activities, it is necessary to increase the capacity of single network connections as much as possible to ensure concurrency. In an operating system, one network connection occupies one file descriptor, so you need to set the maximum number of open files parameter to a large value to avoid running out of file descriptors and causing performance problems. For example, fs.file-max = 65535 indicates that a maximum of 65535 files can be opened.

There are many parameters of TCP protocol, I will not introduce them here. You can refer to the TCP protocol manual and the Linux TCP Parameter manual if you are interested.

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.