A few things about the TCP full connection queue

A, problem,

Today, a friend came up to me and told me I needed help with a strange problem, and it was a strange problem. The client calls to RT are high with intermittent Connection reset, while the server CPU, thread stack, and so on look fine and the SERVER RT is short.

The connection was discarded because the TCP full connection queue was too small, because the project used Tomcat built in Spring Boot, and the default accept-count is 100, which represents the size of the full connection queue. Therefore, during the peak of the request, the full connection queue is filled and connections are discarded. So we adjusted the server.tomcat.accept-count parameter to solve the problem.

Half connection queue and full connection queue

In order to know why, from the exception information may be the TCP connection problem, which is focused on the half-connection queue and full connection queue. Let’s take a look at what TCP half-connection queues and full-connection queues are, and why they have this strange phenomenon.

1. TCP three-way handshake process and queue

For TCP three-way handshake, the Linux kernel maintains two queues:

A half-connection queue, known as a SYN queue
The fully connected queue is called the Accept queue

We all know the TCP three-way handshake.

1. The client sends a SYN packet and enters the SYN_SENT state

2. After receiving the packet, the server puts the relevant information into the semi-connection queue (SYN queue) and returns a SYC+ACK packet to the client.

3. When the server receives ACK packets from the client, if the accept queue is not full, it will take out data from the half-connection queue and put it into the full connection queue, waiting for applications to use it. When the queue is full, it will configure the execution policy according to tcp_ABORT_ON_overflow.

Semi-connection queues (SYN queues) and full connection queues (Accept queues) are the focus here.

2. View the full connection queue

When querying a problem, we need to look at the status of the full connection queue. On the server side, we can use the SS command to check. The data obtained by the SS command is divided into LISTEN state and non-LISTEN state.

“Data in LISTEN state:“

#-l Displays the socket that is listening
#-n Does not resolve the service name
#-t Displays only TCP

#Recv-q Total number of TCP full connections that complete the three-way handshake and wait for the server to accept(),
#Send-q Indicates the size of the full connection queue  [root@server ~]# ss -lnt |grep 6080 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 100 :::6080 :::* Copy the code

“Data in non-LISTEN state:“

#Recv-q Number of bytes received but not read by the application process
#Send-q Number of bytes sent but not acknowledged

[root@server ~]#  ss -nt |grep 6080
State  Recv-Q Send-Q  Local Address:Port   Peer Address:Port
ESTAB 0 433 :::6080 :::* Copy the code

3. The full connection queue overflows

When a large number of requests enter, the TCP full connection queue overflows if it is too small. When the TCP full connection queue overflows, subsequent requests are discarded and the number of service requests does not increase.

As mentioned earlier, in the last step of the TCP three-way handshake, when the full connection queue is full, it is processed according to the tcp_ABORT_on_overflow policy. In Linux, you can use /proc/sys/net/ipv4/tcp_abort_on_overflow.

When tcp_ABORT_ON_overflow =0, the service Accept queue is full, the client sends an ACK, and the server directly discards the ACK. At this time, the server is in the [SYN_RCvd] state, and the client is in the [Established] state. In this state, a timer retransmits SYN/ACK from the server to the client (no more than the number of times specified by /proc/sys/net/ipv4/tcp_synack_retries. In Linux, the default value is 5). After exceeding, the server does not retransmit, and there is no subsequent action. If the client sends data at this point, the server returns RST. (That’s why we’re abnormal.)
When tcp_ABORT_ON_overflow =1 and the accept queue is full, the client sends an ACK message, and the client directly returns an RST message notifies the client that the handshake process and connection are invalid. The client sends a connection reset by peer message.

1). “What indicators do we have when a full connection queue overflows, and what effective means do we have to query it?”

Command query, we can according to the TCP handshake feature:

[root@server ~] netstat -s | egrep "listen|LISTEN"
7102 Times the Listen Queue of a socket overflowed flowed new words & Expressions7102 SYNs to LISTEN Sockets ignored Number of half-connection queue overflow times  
710 2times indicates the number of overflow times of the full link queue. The value is queried every few seconds. If the value keeps increasing, it indicates that the full link queue overflows Copy the code

2). “Configure full connection queue and half connection queue?”

Full connection queue size depends on the backlog and somaxconn minima, i.e. Min (Backlog,somaxconn)

Somaxconn is the Linux kernel parameter, the default 128, can through the/proc/sys/net/core/somaxconn configuration
Backlog is the backlog argument in listen(int sockfd,int backlog). Tomcat defaults to 100, Nginx defaults to 511.

The length of the half-connection queue can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog.

3). “View half-connection Status”

Half-connections, i.e., TCP connections in the SYN_RECV state on the server, are in the half-connection queue, so you can use the following command to calculate:

#View the half-connection queue
[root@server ~]  netstat -natp | grep SYN_RECV | wc -l
233 # Indicates that there are 233 TCP connections in the half-connected stateCopy the code

Third, summary

Based on the above knowledge points, you can locate the cause of this event. The default configuration of Spring Boot Tomcat causes the full connection queue to be 100 by default when the application is started. As a result, the full connection queue is full when traffic surges, and the third handshake packet is discarded.

So to sum up:

1. Linux maintains both full and half connection queues for TCP three-way handshake
2. When the full connection queue is full, the discard policy is implemented according to the configuration of tcp_ABORT_ON_overflow
3. The full connection queue size is set to the minimum value specified in the Linux system configuration and application configuration
4. The Backlog in Linux is what we call the full connection queue size
5. Check whether the full connection queue is correctly configured during application deployment

Backlog configuration

Tomcat AbstractEndpoint defaults to 100. If server. XML is configured using independent Tomcat, the acceptCount is ultimately a backlog value. When using Spring Boot built-in Tomcat, remember to set server.tomcat.accept-count, otherwise default
{listen 8080 default_server backlog=512}
Redis Configudes the tcp-backlog 511 parameter of the Redis

This article is formatted using MDNICE