The mysterious Backlog parameter with the TCP connection queue

Original: Coding diary (wechat official ID: Codelogs), welcome to share, reprint please reserve the source.

Introduction to the

This started with a pressure test project. It was a performance competition between our company’s system and those of other companies in the same industry. The performance data would directly affect the winning of the project, so the pressure was very great.

At that time, Party A provided 17 servers for system deployment and used LoadRunner to perform pressure test on the system. Party B had the complete right to use the server. Party A sent a person to be responsible for pressure test and record performance data, requiring that the pressure test QPS should not be less than 4000.

Project start

After receiving the project, our leader, as the technical director of the project, soon specified the architecture for the project. The Web layer used Nginx as the load balancing, the application layer used WebLogic server cluster (understood as something similar to Tomcat), and the database used Oracle RAC cluster, as follows:

Look at the system architecture is also quite simple, thinking, high concurrency system but so?

After the system code was deployed in accordance with the above architecture, we started the first round of pressure testing.

Well, as soon as we started the manometer, we found a large number of Connection timed out and Read timed out on the manometer.

Strangely, however, we log in to Nginx or WebLogic to observe the machine’s performance. CPU and memory usage are not high, and we look at the application log and find that the log does not seem to receive many requests!

The whole group was puzzled. Why didn’t the pressure request come into the application system? Where did the request go? After some discussion, no one in the group could answer the question, and they all buried themselves in searching for the answer on the Internet.

However, after half a day of research, no one could find out the problem, including the leader. During the process, we adjusted some Nginx and JVM configurations according to the Internet, and no obvious effect was found.

Request for aid

The leader sees that the problem cannot be solved in a short time, so he finds some experts from the crowdsourcing website. Although it costs some money, it is worthwhile to solve the problem quickly.

First of all, the first master came in remotely, and gave up after about 10 minutes of operation. It seems that money is not easy to earn.

Then the second expert came in remotely, took an hour, recompiled nginx, and finally gave up, leaving the problem unsolved.

Then the third master came in remotely. After he came in, he told us to start the pressure test, and then his show time.

He kept watching the output of a command like this:

$ netstat -nat | awk '/tcp/{print $6}'|sort|uniq -c
     16 CLOSE_WAIT
     80 ESTABLISHED
      6 FIN_WAIT2
     11 LAST_ACK
      8 LISTEN
     22 SYN_RECV
    400 TIME_WAIT
Copy the code

After observing it for a while, he asked us to stop and adjust some kernel parameters as follows:

$ vi /etc/sysctl.conf
net.ipv4.tcp_max_syn_backlog = 8192
net.core.netdev_max_backlog = 8192
net.core.somaxconn = 8192
net.ipv4.ip_local_port_range = 1024 65000

net.ipv4.tcp_syn_retries = 3
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_retries1 = 3
net.ipv4.tcp_retries2 = 5
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_tw_reuse = 1
Copy the code

After the adjustment, we started the pressure test again, the effect is very obvious, although the backend system has a lot of errors at this time, but at least the request came in!

At this point, the core technical challenges of this pressure test have been solved, although we have tweaked some JVM parameters and code later, but these are under our own control.

But I didn’t know what the big guy was doing.

Half connection queue and full connection queue

After a period of searching on the Internet, I found a new knowledge point, that is, during the process of TCP connection establishment, there are two queues, respectively half connection queue and full connection queue, as follows:

To provide a service, the server callsbind()withlisten()Function to create a Socket in the LISTEN state.
When the client establishes a connection, it sends a SYN packet, which is the first step of the three-way handshake. After receiving the packet, the server creates a Socket in the SYN_RECV state, places the Socket in the half-connection queue, and replies with SYN+ACK(the second step of the three-way handshake).
After receiving a SYN+ACK, the client changes the Socket status to ESTABLISHED and sends an ACK to the server.
After receiving the ACK, the server moves the Socket in the half-connection queue to the full-connection queue and changes its status to ESTABLISHED.
The service sideaccept()The polling thread will then fetch the new Socket from the full connection queue, which can then be used to exchange data with the client.

As you can see, when the server establishes a connection, it goes through the half-connection queue and the full connection queue. What happens if the half-connection queue or the full connection queue is full?

If the half-connection queue is full, the SYN from the client is discarded. If the client sends a SYN for several times and does not receive a SYN+ACK, the SYN is reportedConnection timed outAbnormal. That explains a lot of errors on our manometer.
If the full connection queue is full, the ACK and packets returned by the client are discarded. The client is in the ESTABLISHED state. The client sends specific request packets, and the request packets are discarded all the timeSoftware caused connection abort :socket write errororRead timed out.
In addition, the full connection queue will also cause the SYN from the first handshake to be discarded, which also causesConnection timed outThe exception.

As you can see, the size of the half-connection queue and the full-connection queue is very important, and the kernel default configuration is 128, which is too small! As follows:

$ vi /etc/sysctl.conf
# This is the size of the half-connection queue
net.ipv4.tcp_max_syn_backlog = 8192
# This is the size of the full connection queue
net.core.somaxconn = 8192

Make configuration changes take effect
$ sysctl -p

Check the current configuration
$ sysctl -a
Copy the code

Alternatively, Socket network programming can also specify a backlog parameter, such as ServerSocket in Java:

int port=8080;
int backlog=8192;
ServerSocket ss = new ServerSocket(port, backlog);
while(true) {// A new connection is received
  Socket s = ss.accept();

  new Thread(()->{
    // Socket read/write operations...
  }, "socket-thread-" + s).start();
}
Copy the code

The backlog parameter is used to specify the size of the full connection queue, but the size of the full connection queue is determined by both the kernel and the application. Net.core. somaxconn and the minimum value of the backlog in the application are taken.

As a result, common network service programs (Tomcat, Redis, mysql, etc.) have a backlog configuration. In Tomcat built-in in SpringBoot, the backlog configuration is as follows:

Server: port: 8080 tomcat: accept-count: 8192 # Built-in Tomcat full-connection queue size configurationCopy the code

Observe the TCP connection queue

The netstat command is used to count the number of sockets in each state as follows:

$ netstat -nat | awk '/tcp/{print $6}'|sort|uniq -c
     16 CLOSE_WAIT
     80 ESTABLISHED
      6 FIN_WAIT2
     11 LAST_ACK
      8 LISTEN
     22 SYN_RECV
    400 TIME_WAIT
Copy the code

As can be seen from this, the number of sockets in ESTABLISHED state is small. The big guy probably thinks that normal connections should not be so small during compression test, and then thinks that there is a problem with the size of TCP connection queue, which leads to the failure of QPS in compression test.

Of course, this is very experience dependent, because this data is not intuitive, it is estimated that the big guy has many years of performance optimization experience, to feel that this connection number is abnormal.

After my extensive search, I found several ways to directly observe connection queue usage.

Observe the connection queue length

The ss command applies to the Socket in LISTEN state
# recv-q is the current size of the full connection queue
# send-q specifies the maximum number of full-connection queues$ss-nLTP State Recv -q Send -q Local Address:Port Peer Address:Port Process LISTEN 0 10 0.0.0.0:8080 0.0.0.0:* users:("ncat",pid=25760,fd=3))

The netstat command applies to the Socket in LISTEN state
# recv-q is the current size of the half-connection queue
# Send -q is usually displayed as 0$netstat -nltp Proto Recv -q Send -q Local Address Foreign Address State PID/Program name TCP 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 25760/ncatCopy the code

The size of the TCP connection queue can be directly observed by using the ss or netstat command. In addition, the number of discarded packets due to queue overflow can also be observed as follows:

Observe the number of times the connection queue overflowed and discarded packets

$ netstat -s|grep -iE LISTEN
  52 times the listen queue of a socket overflowed    # Backlog full, number of overflow times
  52 SYNs to LISTEN sockets ignored                   Number of SYN packets dropped
Copy the code

The SYN was dropped 52 times, possibly because the connection queue was set too small.

Ss Other usage

Each socket can set the recv buffer(read buffer) and send buffer(write buffer). For ESTABLISHED sockets, the ss command can observe the usage of the two buffers as follows:

Recv -q: Indicates that the user process in the Recv buffer is not in timeread()Go data size.
Send-q: indicates yeswrite()The remote host has not returned an ACK acknowledgement packet to the send buffer.

In general, the two buffer cores can be resized automatically and do not need to be manually configured. In addition, netstat can also observe the packet loss caused by the two buffers, as follows:

$ netstat -s|grep -E 'pruned|collapsed'
  19963 packets pruned from receive queue because of socket buffer overrun
  665 packets collapsed in receive queue due to low socket buffer
Copy the code

Collapsed: The contents of many TCP packets with the same header are the same. To save the memory occupied by the TCP header, Linux requires TCP packets with the same header to reuse the same header to save the memory. If the recv buffer of the socket is insufficient, this behavior is triggered.
During an collapsed period, cores lose packets when there is not enough space to receive packets during an collapsed period.

conclusion

I was deeply impressed by this test and found that there are many underlying mechanisms operating outside of the Java world, which generally do not have problems, but when problems do occur, they are very difficult to locate, which prompted me to spend a lot of time to catch up on Linux and network related knowledge.

In addition, knowledge related to TCP connection queue will be shared more in operation and maintenance circles, because they often deploy systems and software, and will naturally learn about it over time. This also urges me to pay more attention to knowledge related to operation and maintenance and DBA level in the future.

Content of the past

Awk is really a magic tool for Linux text command tips (top) Linux text command tips (bottom) character encoding solution

The mysterious Backlog parameter with the TCP connection queue

Introduction to the

Project start

Request for aid

Half connection queue and full connection queue

Observe the TCP connection queue

Ss Other usage

conclusion

Content of the past

Related Posts

What? Feign to 400?

Novice how to study Java systematically? Ten years old architecture experience sharing, including mind mapping, introductory learning resources

Build Eureka clusters dynamically