Jingmai built the gateway in 2014 and developed from HTTP gateway to TCP gateway. In 2016, the TCP long-connection gateway with high availability, high performance and high stability was reconstructed based on Netty4.x+Protobuf3.x to realize the upstream and downstream communication between PC and App. This paper mainly introduces the background, architecture and application of Netty of BEIJING-Mai TCP gateway.

background

In the early days of Jingmai, HTTP and TCP long connection functions were mainly used for message notification push, and were not applied to API gateway. With the gradual in-depth study of NIO and the understanding of Netty framework, as well as the higher and higher requirements on the stability of system communication, I began to adopt NIO technology and apply gateway to realize API request invocation, which was finally realized in 2016 and fully supported the business operation.

Due to many improvements, including TCP long connection container, Protobuf serialization, service generalization call framework, etc., performance is more than 10 times better than HTTP gateway, stability is much higher than HTTP gateway.

architecture

The long connection container of Beijing-Mai TCP gateway is constructed based on Netty, which provides service API request invocation as gateway access layer.

First, network structure

The client accesses the TCP gateway through the domain name and Port. Operators with different domain names correspond to different VIPs. Vips are published on LVS, and LVS forwards requests to HAProxy at the back end.

LVS forwards the request to HAProxy at the back end. The request passes through LVS, but the response is directly fed back by HAProxy to the client, which is also the DR mode of LVS.

2. TCP gateway long-connection container architecture

The core component of TCP gateway is Netty, and THE NIO model of Netty is Reactor model (Reactor is equivalent to a distributed multiplexer Selector). Each connection corresponds to a Channel (multiplexing refers to multiple channels, multiplexing refers to multiple connections multiplexing a thread or a small number of threads, in Netty refers to EventLoop), a Channel corresponds to a unique Channel pipeline, Multiple handlers are added serially to the Pipeline, with each Handler associated with a unique ChannelHandlerContext.

The TCP gateway long connection container Handler is placed in the Pipeline. We know that TCP belongs to the OSI transmission layer, so the establishment of Session management mechanism to build the Session layer to provide services at the application layer can greatly reduce the system complexity.

Therefore, each Channel corresponds to a Connection, and a Connection corresponds to a Session. The Session is managed by the Session Manager, and the Session and Connection are one-to-one. A Connection holds the ChannelHandlerContext (channelHandderContext can find channels), and the Session uses the heartbeat mechanism to keep the Channel Active.

Each Session request (ChannelRead) invokes the Service layer through the Proxy mechanism. After the data request is completed, it is written to the ChannelHandlerConext and sent to the Channel. The same is true for Active downlink data push. The Session Manager can find the Active Session and poll the ChannelHandlerContext written into the Session to realize the logic of broadcast or point-to-point data push.

Application practice of Netty

The Beijing-Maicai TCP gateway uses Netty Channel for data communication, uses Protobuf for serialization and deserialization, each request will be encapsulated into Byte binary Byte stream, in the whole life cycle, the Channel maintains a long connection, Instead of recreating the Channel with each call, link reuse is achieved.

TCP Gateway Netty Server IO model

  1. Create ServerBootstrap and set BossGroup and WorkerGroup thread pools.
  2. The port specified by bind starts listening for and accepting client links. (If the system has only one server port to listen on, set the number of BossGroup threads to 1.)
  3. Register childHandler in ChannelPipeline to handle request frames in client links.

Second, thread model of TCP gateway

The TCP gateway uses Netty’s thread pool, which is divided into three groups: BossGroup, WorkerGroup, and ExecutorGroup. BossGroup is used to receive TCP connections from clients, WorkerGroup is used to process I/O and execute system tasks and scheduled tasks, and ExecutorGroup is used to process gateway service encryption and decryption, traffic limiting, routing, and forwarding requests to the back-end capture service.

NioEventLoop is a Netty Reactor thread that plays the following roles:

  1. Boss Group: serves as a server Acceptor thread that accepts client links and forwards them to threads in the WorkerGroup.
  2. Worker Group: As an I/O thread, it reads and writes I/O packets from and to SocketChannel.
  3. Task Queue/Delay Task Queue: Performs scheduled tasks, such as detecting idle links and sending heartbeat messages.

3. Sequence diagram of TCP gateway execution

Step 1 to Step 9 show the creation sequence of the Netty server, and Step 10 to Step 13 show the creation sequence of the TCP gateway container.

  • Step 1: Create the ServerBootstrap instance. ServerBootstrap is the bootstrap helper class of the Netty server.
  • Step 2: Set and bind the Reactor thread pool. EventLoopGroup is the Netty Reactor thread pool. EventLoop is responsible for all channels registered to this thread.
  • Step 3: Set and bind Server Channel. Netty Server needs to create NioServerSocketChannel object.
  • Step 4: Create ChannelPipeline when TCP connection is established. ChannelPipeline is essentially a chain of responsibilities responsible for and executing ChannelHandler.
  • Step 5: add and set ChannelHandler. The ChannelHandler is added serially to the ChannelPipeline.
  • Step 6: Bind the listening port and start the server, and register the NioServerSocketChannel with the Selector.
  • Step 7: Selector rotation, with EventLoop responsible for scheduling and performing Selector polling.
  • Step 8: Perform network request event notification, poll ready channels, and EventLoop performs ChannelPipeline.
  • Step 9: execute the Netty system and service ChannelHandler, schedule and execute the ChannelHandler of ChannelPipeline in turn.
  • Step 10: Invoke the back-end Service through the Proxy. After the ChannelRead event, dispatch the back-end Service through the transmission.
  • Step 11: Create a Session. The Session and Connection are interdependent.
  • Step 12: Create a Connection that holds the ChannelHandlerContext.
  • Step 13: Add a SessionListener to listen for events such as SessionCreate and SessionDestory.

Four, TCP gateway source code analysis

1. The Session management

Session is a Session link established between the client and the server. The Session information contains the SessionId, Connection creation time, last access event, Connection and SessionListener. Netty’s ChannelHandlerContext is stored in Connection. Session Session information is stored in the SessionManager memory manager.

Create Session source

Through source code analysis, if the Session has been destroyed Session, but this need to pay special attention to the creation of a Session must not create those channels that are disconnected and reconnected, otherwise there will be a Channel destroyed by mistake. If a Connection(1) is set up on a Channel and Connection(2) is set up, the session.close method will close the CXT. Both Connection(1) and Connection(2) channels will be closed. Connection(3) is not the same as Connection(1/2), but the Channel may be the same.

Therefore, how to deal with a Channel that is redrilled or not? The specific method is to store the SessionId in the Channel. Each event request determines whether there is a SessionId in the Channel. If a Channel has a SessionId, the Channel is considered to be disconnected and reconnected.

2. The heartbeat

The heartbeat is used to check whether the connected client is still alive. The client sends a heartbeat packet to the server at intervals. After receiving the heartbeat packet, the server updates the last access time of the Session. The long-connection Session detection on the server checks whether the last access time is expired by polling the Session set. If the access time is expired, the Session and Connection are closed, including deleting the Session from the memory and deregistering the Channel.

After each Session is successfully created, TcpHeartbeatListener will be added to the Session to monitor the heartbeat detection. TcpHeartbeatListener is a daemon thread that implements the SessionListener interface. It checks whether an expired Session exists by polling Sessions for periodic hibernation. If an expired Session is detected, the daemon closes the Session.

Note the Session.connect method, which adds time to sessions, and loops all the sessionCreated events of Listner. The TcpHeartbeatListener is also invoked during this process.

3. Uplink data

Data uplink refers to sending data from the client to the server and retrieving data from the ChannelHander’s channelRead method. Data includes creating sessions, sending heartbeats, and requesting data. Note that channelRead data includes the client’s active request to the server and the server’s downstream notification to the client’s return data. Therefore, when processing object data, the data identifier is used to distinguish between request-reply and notisor-reply.

4. Data downlink

Data downlink is transmitted to all servers through the MQ broadcast mechanism. After receiving the message, all servers obtain all Session sessions held by the current server for data downlink notification. If the peer-to-peer data is pushed downlink, the data is broadcast to all servers first. Every day, the server determines whether the pushed end is a session held by the current server. If the information in the message data is determined to be in the current service, it will be pushed, otherwise it will be discarded.

According to the source code analysis, NotifyProxy is used to send data downstream. It should be noted that Netty is NIO. If the downstream notification needs to obtain the return value, the asynchronous notification needs to be synchronized. So NotifyFuture is to implement Java. Util. Concurrent. The Future of the method, by setting the timeout, after channelRead access to uplink, affiliate NotifyFuture method by seq.

The downstream data is sent through TcpConnector’s Send method, which writes to a Channel through the writeAndFlush method of ChannelHandlerContext to downstream data. Note that Before there is another kind of writing is the cf. Await, judging by blocking the way to success, this kind of writing appear accidental BlockingOperationException exception.

Write using blocking to get the return value

Questions about BlockingOperationException in StackOverflow question, I was very lucky to get the Norman Maurer (core contributor of Netty) solution.

When executing write, Netty determines whether the current thread is an EventLoop assigned to the Channe. If so, the row thread performs IO operations. Otherwise, it submits to executor for allocation. When an await method is executed, an execution thread is fetched from the executor. Here a checkDeadLock is required to determine whether the execution thread and current Threads are the same thread. If it is just an exception detection for deadlock BlockingOperationException.

conclusion

This article briefly introduces the Beijing mai TCP gateway using Netty to achieve the framework of the long connection container, the key points involved in the construction of the TCP long connection container are described one by one, as well as a simple analysis of the source code. In the development process of Beijing Mai Netty has a lot of practical applications, such as Netty4.11+HTTP2 APNs message push and so on.

The authors introduce

Zhang Songran is the architect of THE R&D department of JINGdong Mall. Rich experience in the development and architecture of large-scale distributed systems with high performance and high availability. He joined JINGdong in 2013 and is currently responsible for the system research and development of Jingmai service gateway.

Thanks to Yuta Tian guang for correcting this article.