Problem description

Internal container platform, access layer uses Nginx to do LB, users have GRPC protocol requirements, so the LB layer supports GRCP reverse proxy, nginx supports GRPC reverse proxy from 1.13, after the company used nginx package from 1.12 to 1.14.0, add GRPC reverse proxy configuration. After the configuration is complete, the service is abnormal because the ports on the access layer are occupied during the stress test. The fault is traced.

Pursue the direction

In-depth understanding of GRPC protocol

  • GRPC is a high-performance, universal open source RPC framework, which is developed by Google mainly for mobile applications and designed based on THE HTTP/2 Protocol standard. It is developed based on the Protocol Buffers (ProtoBuf) serialization Protocol, and supports many development languages. GRPC provides an easy way to precisely define services and automatically generate robust client-side libraries for iOS, Android, and backend support services. Clients take full advantage of advanced streaming and linking capabilities, which can help save bandwidth, lower TCP connection counts, save CPU usage, and battery life.

GRPC is based on HTTP2. The connection between the client and server should be maintained for a long time. In theory, the port should not be full

caught

  • The client captures packets from the access layer and checks the request status. It is found that the client is in the long connection state from the access layer. Packet capture from the access layer to the back-end server found that the request did not maintain a long connection. After a request was processed, the connection was disconnected.

Check the configuration of nginx and long connections

  • For details about nginx Long Connections, see the following documents: Nginx Long Connections
  • Grpc_socket_keepalive is directly related to GRPC in version 1.15.6

    Configures the “TCP keepalive” behavior for outgoing connections to a gRPC server. By default, the operating system’s settings are in effect for the socket. If the directive is set to the value “on”, the SO_KEEPALIVE socket option is turned on for the socket.

Nginx is still a short connection to the server after making configuration adjustments referring to the nginx long connection documentation

Update nginx to nginx1.15.6. Update nginx to nginx1.15.6. Update Nginx to nginx1.15.6. There will be a small number of long connections that handle multiple requests, but most will still be short connections.

  • Enable nginx debug mode

From the debug log, it can be seen that Nginx does try to reuse links. However, from the actual packet capture, nginx does not reuse links very much. Most of the links are broken after the request processing.

Adjust the port reclamation policy

  • Back to the problem itself, the problem to be solved is the access layer port is full, various adjustments to nginx long connection configuration can not solve this problem, try to solve the TCP connection port recovery, most of the TCP connection is in TIME_WAIT state.

  • In TIME_WAIT state, the TCP connection is actually broken, but the port cannot be used by new connection instances. In this case, a large number of short connections are established in the program, and the operating system limits the number of ports used to support a maximum of 65535. Too much TIME_WAIT is prone to connection overflow. TIME_WATI optimization has two system parameters: tcp_tw_reuse and tcp_tw_recycle

  • Tcp_tw_reuse reference link

  • Tcp_tw_recycle (Enable Fast Recycling TIME-WAIT Sockets) Reference link

  • Tcp_tw_reuse did not solve the problem, so the more aggressive tcp_TW_RECYCLE service was enabled. The problem was solved because we did not use NAT at the access layer, so this configuration did not affect service.

conclusion

  • Through the test, it is found that nginx’s reverse proxy support for GRPC is not ideal, and most requests from Nginx to the back-end server cannot maintain long connections
  • When using NGINx as an access layer reverse proxy, tuning TCP parameters can avoid problems such as port full