preface

Careful readers may have observed that I put double quotation marks in the title of the word “NIO”. When WE were in primary school, our Chinese teacher often told us what the double quotation marks were for. What did I add double quotation marks to mean? In netty, there is a lot of information that says it is a high performance NIO framework (which is actually true). I looked at the source code of Netty and found that it also contains a package about OIO, but the methods in the new version of OIO are outdated. Of course, in order not to confuse the original understanding of Netty, I will refer to it as a NIO framework. This is just to express my opinion that there is nothing wrong with saying a NIO framework.

Today’s focus is on netty’s high performance aspects.

Netty and its high performance

Well, that’s all for sale. Now let’s formally introduce today’s hero — Netty.

What is the netty

Netty is a high-performance, asynchronous event-driven NIO framework based on API implementations provided by JAVA NIO. It provides support for TCP, UDP and file transfer. As an asynchronous NIO framework, all Netty IO operations are asynchronous and non-blocking. Through the Future-Listener mechanism, users can easily obtain the I/O operation results or through notification mechanism.

Why netty

If you use NIO in the JDK, you have the following problems:

  1. API complex
  2. Familiar with multithreading: NIO refers to the Reactor schema
  3. In the case of high availability, problems such as disconnection and reconnection, half-packet read and write, and cache failure are solved
  4. The JDK NIO bug
  5. There is processing for TCP unpacking/sticking packets

Netty, for its part, has a simple API, high performance, and active community (dubbo, RocketMQ, and others use it)

Netty high-performance

  • In the IO programming process, when multiple client access requests need to be processed at the same time, multithreading or IO multiplexing technology can be used to process.
  • IO multiplexing allows the system to process multiple client requests simultaneously in a single thread by reusing multiple IO blocks to the same SELECT block.
  • Compared with the traditional multithreading/multi-process model, the biggest advantage of I/O multiplexing is that the system cost is small, the system does not need to create new additional processes or threads, and does not need to maintain these processes and threads running, reducing the maintenance workload of the system, saving system resources.
  • NIO also provides two different Socket channel implementations, SocketChannel and ServerSocketChannel, corresponding to the Socket and ServerSocket classes.

Multiplexing communication mode

Netty architecture is designed and implemented according to Reactor model, and its server communication flow is as follows:

Asynchronous communication NIO

Because Netty adopts the asynchronous communication mode, one IO thread can concurrently process N client connections and read and write operations, which fundamentally solves the traditional synchronous blocking IO connection-thread model, and greatly improves the performance, elastic scalability and reliability of the architecture.

Zero copy (DIRECT BUFFERS use out-of-heap memory)

When sending data, the traditional implementation is:

File.read(bytes)
Socket.send(bytes)
Copy the code

This method requires four data copies and four context switches:

  1. Data is read from disk to the kernel’s Read buffer
  2. Data is copied from the kernel buffer to the user buffer
  3. Data is copied from the user buffer to the kernel’s socket buffer
  4. Data is copied from the kernel’s socket buffer to the nic interface’s (hardware) buffer

In Netty, out-of-heap memory is used for copying:

  1. Netty receives and sends bytebuffers using DIRECT BUFFERS, which use out-of-heap DIRECT memory for Socket reading and writing without the need for secondary copy of byte BUFFERS. If traditional HEAP BUFFERS are used for Socket reads and writes, the JVM copies the HEAP Buffer into direct memory before writing it to the Socket. The message is sent with an extra memory copy of the buffer compared to direct out-of-heap memory.
  2. Netty provides the combined Buffer object, which can aggregate multiple ByteBuffer objects. Users can operate the combined Buffer as easily as operating a single Buffer. The traditional memory copy method of combining several small buffers into one large Buffer is avoided.
  3. Netty adopts the transferTo method to transfer files, which can directly send the data in the file buffer to the target Channel, avoiding the memory copy problem caused by the traditional write method.

Memory pools (memory pool-based buffer reuse mechanism)

With the development of THE JVM and just-in-time compilation technology, object allocation and collection is a very lightweight task. For buffers, however, the situation is slightly different, especially for direct out-of-heap memory allocation and reclamation, which can be a time-consuming operation. To maximize the reuse of buffers, Netty provides a buffer reuse mechanism based on memory pools.

An efficient REACTOR thread model

There are three commonly used Reactor thread model, Reactor single-thread model, Reactor multi-thread model, principal and slave Reactor multi-thread model.

Reactor single-threaded model

Reactor single-thread model, which refers to the fact that all I/O operations are done on the same NIO thread. The NIO thread has the following responsibilities:

  1. As the NIO server, it receives TCP connections from the client.
  2. As the NIO client, initiate TCP connection to the server.
  3. Read the request or reply message of the communication peer end;
  4. Sends a request or reply message to the peer.

    Because the Reactor pattern uses asynchronous non-blocking I/OS, none of the I/O operations cause blocking, and theoretically one thread can handle all of the IO operations independently. From an architectural point of view, a NIO thread can do what it’s supposed to do. For example, a TCP connection request message is received from an Acceptor client. After a link is established, the corresponding ByteBuffer is sent to a specified Handler through Dispatch to decode the message. The user Handler can send messages to the client through the NIO thread.

Reactor multithreaded model

The main difference between the Rector multithreaded model and the single-threaded model is that there is a set of NIO threads handling IO operations. A special NIO thread, the Acceptor thread, listens to the server and receives TCP connection requests from the client. Network IO operations – read, write, and so on are handled by a NIO thread pool, which can be implemented using the standard JDK thread pool. The thread pool consists of a task queue and N available threads, which are responsible for reading, decoding, encoding, and sending messages.

Reactor multithreaded model

Instead of a single NIO thread that the server uses to receive client connections, there is a separate NIO thread pool. After an Acceptor receives a TCP connection request from a client, it registers the SocketChannel with an I/O thread in the SUB REACTOR thread pool. It is responsible for the read, write, codec and decoding of socketchannels. The Acceptor thread pool is only used for client login, handshake, and security authentication. Once a link is established, it is registered with the I/O thread of the back-end subReactor thread pool, which is responsible for the subsequent I/O operations.

Lockless design, thread binding

Netty adopts the serial lock-free design to conduct serial operations within IO threads to avoid performance degradation caused by multithreading competition. On the surface, the serialization design appears to be CPU inefficient and not concurrent enough. However, by adjusting the thread parameters of the NIO thread pool, multiple serialized threads can be started simultaneously to run in parallel. This partially lock-free serialized thread design is superior to the one-queue-multiple worker thread model.

High-performance serialization framework

Netty supports Google Protobuf by default. By extending Netty’s codec interface, users can implement other high-performance serialization frameworks, such as Thrift’s compressed binary codec framework.

  1. SO_RCVBUF and SO_SNDBUF: 128 KB or 256 KB is recommended.

Small packets seal large packets to prevent network congestion

  1. SO_TCPNODELAY: The NAGLE algorithm automatically connects small packets in the buffer to form larger packets, preventing the sending of a large number of small packets from blocking the network, thus improving the network application efficiency. However, this optimization algorithm needs to be disabled in delay-sensitive application scenarios.

The Hash value of soft interrupt is bound to the CPU

  1. Soft interrupt: After RPS is enabled, soft interrupt is implemented to improve network throughput. RPS calculates a hash value based on the packet’s source address, destination address, destination and source port, and then uses this hash value to select the CPU on which the soft interrupt will run. To balance soft interrupts on multiple cpus and improve network parallel processing performance.

By the way

There is a problem? Can you leave me a message or chat privately? Just give it a thumbs up

Of course, you can also go to my official account “6 Xi Xuan”,

Reply to “Learn” and receive a copy of the Video tutorial for Advanced Architects for Java Engineers

Answer “interview”, can obtain:

MySQL brain Map MySQL brain map

Because, I trained programmers, PHP, Android and hardware are done, but in the end or choose to focus on Java, so have what questions to ask the public for discussion (emotional pouring technology can ha ha ha), see words will reply as soon as possible, hope can with everyone common learning progress, on the server architecture, Java core knowledge analysis, career, interview summary and other articles will be pushed irregularly output, welcome to pay attention to ~~~