The Linux IO model is not that difficult

IO is actually Input and Output. In the operating system, it corresponds to the Input and Output of data stream. The two ends of the data flow can be files or a host on the network. However, whether it is a file, or a network host, its transfer is similar, we will explain the source of the file today.

To get a file from disk to our memory, it has to go through a lot of complicated operations. First, the data needs to be read from the hardware, then put into the operating system kernel buffer, then copied into the program buffer, and finally the application can read the file. Simply put, no matter what IO model it is, the reading process will always go through the following two phases:

Wait for the data to reach the kernel buffer
Copies data from the kernel buffer to the program buffer

According to whether these two phases block or not, Linux divides into five classic IO models, which are as follows:

Blocking IO model
Non-blocking IO model
IO multiplexing model
Signal driven IO model
Asynchronous IO model

Blocking IO model

Blocking IO is called a Blocking IO. In the blocking IO model, when a process makes a file read request (recvFROM system call), if there is no data in the kernel cache, it does not immediately recover. Instead, it reads data from disk, and when the data is read, it is returned to the process. At this point, the first stage is complete. At this stage the process is blocked because it waits for the kernel to read data into the kernel buffer.

When a process receives a response from the kernel, it copies the data from the kernel buffer to the program buffer, and finally reads the file. At this point, the second stage is complete. The process also blocks at this stage because it copies data from the kernel buffer to the program buffer.

To put it simply: in the blocking IO model, everything from the hardware to the system kernel, and from the system kernel to the program space, is blocked.

Non-blocking IO model

In the non-blocking IO model, when a request makes a recvFROM, the kernel reads the file data if there is no data in the kernel buffer. Instead of blocking, the request returns an error message (EWOULDBLOCK) telling the process that the data is not ready and you will try again later.

So the process keeps retrying the kernel, asking: Is the data ready? Is the data ready? When the kernel is ready for the data, the process receives the corresponding message, and the first phase is over. Non-blocking in IO means that the process will not block here, but will keep retrying.

This is not very useful and will cause the CPU to idle, but it is still a bit better than before. The process is not blocked at this stage. When the process knows that the kernel is ready, it copies the data from the kernel buffer to the program buffer. This phase is exactly the same as the blocking I/O model, which also causes the process to block.

To put it simply: in the non-blocking IO model, everything from the hardware to the system kernel, and from the system kernel to the program space, is blocked as well. But it is better than blocking IO, not standing there motionless, at least ran. It’s a waste of time, but it’s a little more efficient.

IO multiplexing model

IO multiplexing is called multiplexing because it can operate on multiple data streams simultaneously. Blocking I/OS and non-blocking I/OS can operate only one data stream at a time. In the IO multiplexing model, a process listens for multiple data streams and blocks, and when any one stream has data, it receives a response from the kernel. At this point, the first phase is complete, where the process is actually blocked.

Upon receiving a response from the kernel, the process copies the data from the kernel buffer to the program buffer. This phase is identical to the previous two models, and the process is blocked as well.

To put it simply: the IO reuse model in the second phase is identical to blocking and non-blocking IO. However, in the first stage, its efficiency is greatly improved by polling multiple data streams at the same time.

Signal driven IO model

Signal drive differs from the previous models in the word signal. Signal-driven IO in the first phase, before data reaches the kernel buffer, the process does not block, but instead sets up a signal callback. When the data reaches the kernel buffer, the kernel invokes the program’s callback. In this way, signal-driven IO processes can do other things without blocking.

When the process receives the signal, the process copies the data from the kernel buffer to the program buffer. This process is exactly the same as the previous ones, and is also blocked.

Signal-driven IO can be said to be a milestone of IO reading, which truly realizes asynchronous data reading. Signal driven IO has two phases, which are the same as those above. But it is truly asynchronous in the first phase. In the first phase, signal-driven IO asks the kernel to read data. Instead of blocking, it does not seek the wheel, but sets a signal callback. When the data is fully copied to the system kernel, the system sends a SIGIO signal to inform the process to proceed to the second stage, copying the data to the program buffer.

Asynchronous IO model

Asynchronous IO is truly non-blocking compared to previous processes. Both in phase one and in phase two are non-blocking. Similar to signal-driven IO, the asynchronous IO model implements non-blocking in the first phase by means of signal callbacks. When the data reaches the kernel buffer, the process is notified.

When the process receives the notification, it copies the data from the kernel buffer to the process buffer again, but instead of waiting, it sets up a signal callback as well. When the replication is complete, the process is notified and processed accordingly.

Asynchronous IO is more complete than signal-driven IO!

Asynchronous IO not only implements signal callback in the first phase, but also in the second phase, thus fully implementing asynchronous I/O operations.

conclusion

Let’s review the five IO models:

Blocking IO model: hardware-to-system kernel, blocking. System kernel to program space, blocking.
Non-blocking IO model: hardware-to-system kernel, polling blocking. System kernel to program space, blocking.
Multiplexing IO model: hardware to system kernel, multi-stream polling blocking. System kernel to program space, blocking.
Signal-driven IO model: hardware to system kernel, signal callback does not block. System kernel to program space, blocking.
Asynchronous IO model: hardware-to-kernel, signal callbacks do not block. System kernel to program space, signal callback is not blocked.

From the above five IO models, we can see that asynchronous non-blocking IO is the only model that truly implements asynchronous non-blocking IO, while the other four are synchronous IO. Because you can’t do anything else in the second phase: copying from the kernel buffer to the process buffer.

Well, that’s all for today’s Linux IO model sharing.

Thank you for reading. If this article is helpful to you, feel free to comment, forward and like. See you next time

The Linux IO model is not that difficult

Blocking IO model

Non-blocking IO model

IO multiplexing model

Signal driven IO model

Asynchronous IO model

conclusion

Related Posts

Summary of MAC Problems

Leetcode Daily Problem Series – Student Attendance Record II- “DFS + Memorization” – “Dynamic Programming” – “State Compression” – “Matrix Fast Power”

Learn about the use of Protoc