Linux IO communication and Reactor thread model analysis

The author | guang-ming xu

Almond backend engineer. Young programmers, focus on server-side technology and pesticides.

IO communication model

Let’s start by talking about I/O communication. When it comes to I/O communication, there are typically synchronous I/O, asynchronous I/O, blocking I/O, and non-blocking I/O. The differences between synchronous, asynchronous, blocking, and non-blocking are often unclear, and it is difficult to reach a consensus on concepts that vary from person to person. The background of this article is Network I/O in a Linux environment.

Analyze an I/O process

A Network I/O (for example, read) involves two system objects: the process or thread that calls the I/O, and the kernel. When a READ operation occurs, there are two phases (it is important to remember these two phases because the difference between I/O models is that each phase is handled differently) :

Phase 1: Waiting for the data to be ready;
Second stage: Copying the data from the kernel to the process;

Five I/O models

UNIX® Network Programming Volume by Richard Stevens mentions five I/O models:

Blocking I/O (synchronous Blocking I/O)
Nonblocking I/O (synchronizing non-blocking I/O)
Multiplexing I/O multiplexing
Signal Driven I/O (rarely used, not supported in Java)
Asynchronous I/O

Let’s illustrate and compare the five I/O models.

Blocking I/O

In Linux, all sockets are blocking by default. For a typical read operation, the flow is shown below:

When a user process invokes the recvFROM system call, the I/O call goes through two phases:

Prepare data: Many times for network requests, the data does not arrive at first (i.e., a complete UDP packet has not been received), and the kernel waits for enough data to arrive. On the user side, the entire process is blocked.
Data return: As soon as the data is ready, the kernel copies the data from the kernel to user memory. Then the kernel returns the result, and the user process unblocks and starts running again.

Nonblocking IO

On Linux, you can make it non-blocking by setting the socket. When a read operation is performed on a non-blocking socket, the flow is shown as follows:

When a user process sends a read operation, the process is divided into the following three processes:

Start preparing data: If the data in the Kernel is not ready, it does not block the user process, but immediately returns an error.
Data in preparation: From the user process’s point of view, when it initiates a read operation, it does not wait, but gets a result immediately. When the user process determines that the result is an error, it knows that the data is not ready, so it can send the read operation again (repeat rotation).
Once the kernel is ready and receives a system call from the user process again, it copies the data to the user’s memory and returns.

I/O multiplexing

This type of I/O can also be called Event Driven I/O. The benefit of Linux SELECT /epoll is that a single process can handle I/O for multiple network connections simultaneously. The basic principle is that select/epoll polls all the sockets it is responsible for, and notifies the user process when data arrives on a socket. The process is shown as follows:

When the user process calls select:

The entire process is blocked, and the kernel “monitors” all select sockets, returning when the data in any socket is ready.
The user process then calls the read operation to copy data from the kernel to the user process. This isn’t really that different from the blocking I/O graph, in fact, it’s worse. Because two system calls (select and recvFROM) are required, blocking I/O calls only one system call (recvfrom).
Multiplexing Model = I/O multiplexing Model = I/O multiplexing Model = I/O multiplexing Model = I/O multiplexing Instead of being blocked by socket I/O, process is blocked by the select function.

Asynchronous IO

Asynchronous I/O, or asynchronous I/O, under Linux is rarely used (requiring advanced system support). Its flow is shown as follows:

When a user process issues a read operation:

When a user process initiates a read operation, it does not wait, but immediately gets a result and can start doing other things.
From the kernel’s perspective, when it receives an asynchronous read, it first returns immediately, so no blocks are generated for the user process. The kernel then waits for the data to be ready and copies the data to the user’s memory. When this is done, the kernel sends a signal to the user process telling it that the read operation is complete.

Through the description of the above four I/O communication models, their respective characteristics are summarized:

Blocking I/O is characterized by the fact that both phases of I/O execution are blocked.
The non-blocking I/O feature does not require blocking if the kernel data is not ready.
The advantage of I/O multiplexing is that it can process multiple connections at the same time with SELECT. (A Web server using Select /epoll does not necessarily perform better than a Web server using multi-threading + blocking I/O if the number of connections being processed is not very high, and the latency may be even greater. The advantage of select/epoll is not that it can process individual connections faster, but that it can process more connections.
Asynchronous IO is characterized by the fact that the client does not have any block state during the call, but requires advanced system support.

Communication model in life

The above five I/0 models are introduced, such as boring, but there are similar “communication models” in our daily life. To help us understand, let’s use the inappropriate example of asking girls to have dinner in daily life to illustrate these I/O models (suppose I want to ask some girls to have dinner with wechat now) :

I sent wechat to ask the first girl if she was ready, she didn’t reply and kept waiting until she sent her second blocking I/O.
I will send wechat to the first girl to ask if she is ok. I don’t care if she doesn’t reply, I will send it to the second girl, but LATER I will continue to ask if the girl who didn’t reply before is nonblocking I/O ok.
Take all the girls into a wechat group and ask them once in the group who is ready to reply (I/O multiplexing).
Just tell the girl the address and time of the meal and go yourself.

Reactor thread model

What is a Reactor?

Reactor is a process model. The Reactor model is a common model for concurrent I/O processing. It is used for synchronous I/O. The core idea is to register all I/O events to be processed to a central I/O multiplexer, and the main thread/process is blocked on the multiplexer. Once an I/O event arrives or is ready (file descriptors or socket readable and writable), the multiplexer returns and distributes the pre-registered corresponding I/O event to the corresponding processor.

Reactor is also an implementation mechanism. Reactor uses an event-driven mechanism, which differs from regular function calls in that: Instead of the application actively calling an API, the Reactor inverts the event processing process. The application needs to provide an interface and register it with the Reactor. If an event occurs, the Reactor will actively call the interface registered by the application. These interfaces are also called “callback functions”. The Hollywood rule is perfect for Reactor: Don’t call us, we’ll call you.

Why Reactor?

Generally speaking, the Epoll pattern can support hundreds of thousands of concurrent connections to a server through I/O reuse while maintaining very high TPS. Why Reactor pattern? The reason is the complexity of native I/O reuse programming.

Each network request may involve multiple I/O requests. I/O reuse is not natural in the human mind compared to the traditional single-threaded approach to complete the request life, because when A programmer processes request A, Assume that request A must go through multiple I/O operations a1-an (which may take A long time between two I/ OS). After each I/O operation, when I/O multiplexing is invoked, it is very likely that the call for I/O multiplexing will return request B instead of A. Request A is frequently interrupted by request B, and request B is interrupted by C. In this mindset, programming is prone to error.

Reactor thread model

Reactor has three threading models, allowing users to choose the appropriate model for their environment.

Single threaded model
Multithreaded model (Single Reactor)
Multithreaded model (MULTIPLE Reactor)

Single threaded mode

The single-threaded model is the simplest Reactor model. The Reactor thread is a generalist who multiplexes sockets, accepts new connections, and dispatches requests to the processor chain. This model is suitable for scenarios where the business processing components in the processor chain can be completed quickly. However, this single-threaded model can not take full advantage of multi-core resources, so the actual use is not much.

Multithreaded model (Single Reactor)

This model adopts multithreading (thread pool) in the Handler chain, which is also a common model for back-end programs.

Multithreaded model (Multiple Reactor)

Compared with the multithreaded single-rector model, the Reactor is divided into two parts. The mainReactor listens for and accepts new connections, and then assigns established sockets to the subReactor through multiplexers. SubReactor is responsible for multiplexing connected sockets and reading and writing network data. Business processing functions, which are handed over to the worker thread pool. In general, the number of subreactors can be equal to the number of cpus.

Reactor using

Many open source products in the software world use the Ractor model, such as Netty.

Netty Reactor practice

Server-side threading model

Server listener threads and I/O threads are separated, which is similar to Reactor’s multithreading model. The schematic diagram is as follows:

Server user thread creation

Two EventLoopGroups were instantiated when the server was created. A bossGroup is a pool of Acceptor threads that process TCP connection requests from clients. A workerGroup is the thread group that is actually responsible for I/O reads and writes. Netty is a multiple Reactor model.
The ServerBootstrap class is Netty’s helper class for starting NIO and is easy to develop. Pass the thread group to ServerBootstrap using the group method, set Channel to NioServerSocketChannel, and set TCP parameters for NioServerSocketChannel. Finally, bind the I/O event handler class ChildChannelHandler.
After the helper class completes the configuration, it calls bind to the listening port. Netty returns ChannelFuture, f.channel().closeFuture().sync() to get the result of synchronization blocking.
The call thread group shutdownGracefully exits, releasing the resource.

Public class TimeServer {public void bind(int port) {// Configure the server NIO thread group EventLoopGroup bossGroup = new NioEventLoopGroup(); EventLoopGroup workGroup = new NioEventLoopGroup(); try { ServerBootstrap b = new ServerBootstrap(); b.group(bossGroup, workGroup).channel(NioServerSocketChannel.class) .option(ChannelOption.SO_BACKLOG, 1024) .childHandler(new ChildChannelHandler()); ChannelFuture f = b.bind(port).sync(); ChannelFuture f = b.bind(port).sync(); // Wait for the server listening port to close. } catch (Exception e) { e.printStackTrace(); } the finally {/ / release the thread pool resources bossGroup. ShutdownGracefully (); workGroup.shutdownGracefully(); } } private class ChildChannelHandler extends ChannelInitializer<SocketChannel> { @Override protected void initChannel(SocketChannel ch) throws Exception { ch.pipeline().addLast(new TimeServerHandler()); }}}Copy the code

Server SIDE I/O thread handling (TimeServerHandler)

ExceptionCaught method: This method is called when the I/O processing is abnormal. The exceptionCaught method closes the ChannelHandlerContext to release resources.
ChannelRead method: This is the method that actually handles reading and writing data, reading request data via buf.readbytes. Write (RESP) to send corresponding packets to the client.
The channelReadComplete method: To improve performance, Netty write writes data to the buffer array first, and flush all messages from the buffer array to SocketChannel.

public class TimeServerHandler extends ChannelHandlerAdapter { @Override public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) throws Exception { ctx.close(); } @Override public void channelRead(ChannelHandlerContext ctx, Object MSG) throws Exception {// MSG turn Buf ByteBuf Buf = (ByteBuf) MSG; Byte [] req = new byte[buf.readableBytes()]; // Write the array buf.readbytes (req); String body = new String(req, "UTF-8"); String currenTime = "QUERY TIME ORDER".equalsIgnoreCase(body) ? new Date( System.currentTimeMillis()).toString() : "BAD ORDER"; ByteBuf resp = unpooled.copiedBuffer (currentime.getBytes ()); // Buffer write channel ctx.write(resp); } @override public void channelReadComplete(ChannelHandlerContext CTX) throws Exception {// Write Invoke after reading the buffer array Flush Write channel ctx.flush(); }}Copy the code

conclusion

I have a general knowledge of Reactor through the above. Finally, the advantages and disadvantages of Reactor model are summarized.

advantages

Fast response, although the Reactor itself is still synchronized and does not have to be blocked by a single synchronization time.
Programming is relatively simple, avoiding complex multithreading and synchronization problems to the greatest extent, and avoiding multithreading/process switching overhead.
Scalability: Fully utilize CPU resources by increasing the number of reactors through concurrent programming.
Reusability: The Reactor framework itself is independent of the specific event processing logic and has high reusability.

disadvantages

Compared with the traditional simple model, Reactor has a certain complexity, so there is a certain threshold, relatively complex debugging.
The Reactor pattern requires underlying Synchronous Event Demultiplexer support, such as Selector in Java, and select system call support for operating systems.
The single-thread Reactor mode is implemented in the same thread for I/O data reading and writing. Even if multiple reactors are used, if a data reading and writing occurs in a shared Reactor Channel for a long time, Will affect the time of other channels in the Reactor. For example, I/O operations will affect the time of other clients during large file transfers. Therefore, traditional Thread-per-connection may be a better choice for this operation. Or use Proactor mode at this point.

The full text after

You may also be interested in the following articles:

Exploring integration Testing in a microservice environment (PART 1) — Service Stubs & Mocks
Integration Testing in microservice Environment (PART 2) — Contract testing
Lego micro Service Transformation (I)
Lego micro service Transformation (II)
A startup’s Path to containerization (I) – Containerization before
A startup’s path to containerization (II) – Containerization
The containerization of a startup (iii) – The container is the future
Responsive programming (PART 1) : Overview
Responsive programming (part 2) : Spring 5
Processing of complex business state: from state mode to FSM
Brief introduction to back-end cache system
What exactly is abstraction and the principles of abstraction in software design
The so-called Serverless, do you understand correctly?

We are looking for a Java engineer. Please send your resume to [email protected].

Almond technology station

Long press the left QR code to pay attention to us, here is a group of passionate young people looking forward to meeting with you.

Linux IO communication and Reactor thread model analysis

IO communication model

Analyze an I/O process

Five I/O models

Blocking I/O

Nonblocking IO

I/O multiplexing

Asynchronous IO

Communication model in life

Reactor thread model

What is a Reactor?

Why Reactor?

Reactor thread model

Single threaded mode

Multithreaded model (Single Reactor)

Multithreaded model (Multiple Reactor)

Reactor using

Netty Reactor practice

Server-side threading model

Server user thread creation

Server SIDE I/O thread handling (TimeServerHandler)

conclusion

Related Posts

The server was maliciously attacked because Redis did not set password

Dubbo remote call source code read

Force button brush problem note → 189. Rotate array