What are the five IO models in the system? What is select/poll/epoll? What’s the difference between synchronous asynchronous blocking and non-blocking?

This article link source: Yang Jianyong’s personal blog

Let’s solve the first puzzle: what are the five I/O models

-Blocking I/O

– NonBlocking I/O

Multiplexing I/O Multiplexing

– Asynchronous IO【 Asynchronous I/O】

– Signal Driven I/O

Here’s how these I/O models work…

Let’s start with a wave of premises

The whole computer system involves two parts: hardware foundation and operating system, and the software we develop runs on the system

Let’s start with the hardware components of a modern computer system:The bus,I/O devices,Main memory,The processor

The busIt is a set of electronic pipes that run through the entire computer system and carry bits of information between its components

I/O devicesIt's the channel that connects the system to the outside world. Such as mouse/keyboard/monitor and so on

Main memoryIs a temporary storage device for programs and the data they need to process

The processor is the central processing unit..An engine that interprets instructions stored in main memory. In plain English, we throw the data we need to compute and the process we need to compute into the processor and let it compute what we want

Let’s look at the operating system.The operating system is known as Windows/Linux/Unix. An operating system is a layer of software between hardware and applications. The software (operating system) provides two basic functions: one is to prevent the hardware from being abused by runaway applications; One is to provide applications with a simple and consistent mechanism to control different hardware devices. How these two functions are implemented is through three basic abstractions: processes, virtual memory, and files. And that process, that's what we're going to talk about today, but we won't talk about virtual memory and files here. Learning how to use processes is one of the necessary lessons for programmers

Kernel space and user space.As mentioned above, on top of the hardware, there is an operating system, and above that are applications. The operating system has direct control over the hardware and access to the protected memory space. Application processes cannot manipulate the hardware directly. To protect the operating system kernel, the operating system divides the virtual space into two parts: kernel space and user space. The detailed introduction of these two concepts will not be written here. All you need to know here is that a process runs on a space called virtual memory, part of which is used by the kernel, called kernel space, and part of which is used by processes, called user space. Processes need to operate hardware and can only do this by calling kernel functions provided by the operating system, such as read and write operations in Linux

Get into the subject

What is a process?

As we’ve already seen, an operating system is a layer of software that sits between hardware and application programs. And we develop various applications, such as Photoshop/ wechat /QQ and so on, are running on this operating system. So how do our applications use the hardware and how do we make sure that data doesn’t get corrupted between applications?processThe stage

processIt is an abstraction provided by the operating system, which provides an illusion to various applications, making different applications appear as if they are occupying hardware resources independently, and letting the hardware process the running software independently. That is, a process represents a process that software is executing. But if you have more than one application running on your computer, you have more than one application running at a time, and the operating system sets up many different processes to handle different applications. But the CPU can only process one program at a time. What happens when different programs need to be processed by the same CPU? Here comes the focus of this article: five I/O models for concurrent processing of different programs

The concept of progression is difficult for beginners to understand at first. Here's an example:

We certainly played with blocks when we were kids, and there are some graphics tutorials that we've already drawn in the blocks that we've just bought that show us how to build different shapes, so we build blocks and we build shapes, right

In the example above, we are the processor, the CPU; And those graphics tutorials are programs, or algorithms; The rest of the various building blocks are the data to enter. So the process is the sum of the whole process of us learning the graphics tutorial and then using a variety of building blocks to build different shapes

1, Blocking I/O

I/O is Blocking I/O. As can be seen from the figure, when the process calls the system call of recvfrom, the kernel starts IO operation, which is to prepare data. The kernel needs to fetch the data, wait for it to be copied to kernel space, and then copy it from kernel space to user space. There are two processes involved: one is that the operating system kernel waits for data to be copied into kernel space, and the other is copying data from kernel space to user space. In this Blocking IO model, both procedures block and remain blocked until the data reaches user space. This is called blocking IO, or BIO

Use the example of life to understand blocking IO: a customer is a kind of software, the milk tea shop is the operating system, and the machine making milk tea is the CPU

Customer to milk tea shop, with milk tea shop clerk said to buy milk tea, has carried on the system call 】 【 customers also do what thing is waiting, the blocking state 】 【 blind and clerk after receiving the customer's requirements, began to use the machine to make milk tea 】 【 kernel ready to data, wait for is made after the kernel ready data 】, Then put the milk tea in the foreground [copy the data from kernel space to user space], and inform the customer to come to fetch the milk tea, and the customer takes it and leaves [read operation]

The whole process is that the customer says to buy milk tea in the milk tea shop, the machine makes milk tea, the clerk takes the milk tea to the front desk and the customer takes it away

2, NonBlocking I/O

NonBlocking I/O, also called NIO, NonBlocking I/O. It can also be seen from the figure that when the process calls the system call of recvfrom, the kernel also starts IO operation. But unlike BIO, BOI suspends the process of preparing data for the kernel and does not return until the kernel is ready. NIO, on the other hand, returns an error telling the process that the data is not ready. Similarly, on the process side, there is no need to wait until an error occurs that the data is not ready, the system call is sent again until the kernel is ready, and the data is copied from kernel space to user space. The main feature of NIO is that the process constantly asks the kernel if the data is ready.In my opinion this is a very silly operation, maybe I did not understand it thoroughly ~ ~

For example: when the customer goes to a milk tea shop, the customer tells the clerk to buy milk tea [system call]. At this time, the clerk tells the customer to wait for a moment [directly return error], and then starts to use the machine to make milk tea [kernel to prepare data]. On the other hand, the customer repeatedly asks the shop assistant whether the milk tea is ready until the shop assistant makes the milk tea. Again, the client did nothing but keep asking. If it is not done well, the clerk will also answer that it is not done yet. If it is done well, the clerk will take the milk tea to the front desk [copy it to user space] and inform the customer so that the customer can come and take it away.

The only difference with blocking IO is that the process repeatedly asks for data before it is ready

Multiplex I/O Multiplexing

Multiplexing I/O Multiplexing. Select /poll/epoll select/poll/epoll IO multiplexing is also called Event Driver IO. What is select/poll/epoll

Let’s take a look at how IO multiplexing works. As you can see from the figure, the operating system provides a system call called SELECT. When a user process calls SELECT, all the sockets that the user process is responsible for are also responsible for the select. For the kernel, it listens on all the sockets that select is responsible for. When called, select continuously polls all sockets. When data arrives on a socket, select returns, telling the user process to copy data from kernel space to user space

Example: There is a big difference between the two methods above. If there are many customers to buy milk tea, each customer uses a card to write his name with the type of milk tea he wants to buy on it, and then hand it to the clerk. Every time the shop assistant makes the milk tea, he should find the corresponding name in a pile of cards according to the model of the milk tea made, and inform the customer to come to pick up the milk tea. In this way, every time the shop assistants make milk tea, they need to spend a lot of time looking for customers in a pile of cards. Similarly, the client does nothing but wait for the milk tea to be ready.

4, Asynchronous I/O

Asynchronous IO is also called AIO. In fact, asynchronous IO is best understood. Asynchronous IO is when a user process makes a system call, the kernel returns error, and the user process receives the return and does not wait to do other things. The kernel does the rest of the data preparation work itself, and when the data is ready, it no longer tells the user process to copy the data. Instead, it copies the data from kernel space to user space. When all is done, a Signal is sent to the user process telling it to read the data directly.

For example: after the customer goes to the milk tea shop, he directly tells the clerk what kind of milk tea he wants to buy, and then leaves a phone number for the clerk to go back [it is no longer blocked]. After the milk tea is ready, the clerk calls the customer to come to pick it up

The biggest difference with AIO is that when a process makes a system call, it no longer waits around stupidly, but does something else

5, Signal Driven I/O

Signal Driven IO is also called SIGIO, signal-driven IO. Signal-driven IO is rarely used in real development because there are too many useless signals. In this model, it can be seen from the figure that when the process invokes the system call, the model will set up a signal handler for the socket request, and the process will not be suspended. When the kernel is ready for data, it generates a SIGIO signal to the previously established signal processor, which can then read and write.Since this model is rarely used, it will not be covered in detail here

Second question: what is the difference between synchronous and asynchronous blocking and non-blocking

First, before you start, say important things three times

The concepts described here are not strictly related to synchronous asynchronous blocking and non-blocking mentioned in the five IO models above

The concepts described here are not strictly related to synchronous asynchronous blocking and non-blocking mentioned in the five IO models above

The concepts described here are not strictly related to synchronous asynchronous blocking and non-blocking mentioned in the five IO models above

In order to understand the difference between synchronous asynchronous blocking and non-blocking, please clear your mind and do not associate this concept with the words mentioned in the IO model, otherwise it will affect your understanding of the concept!!

So what is synchronous asynchrony

As the name suggests, the literal meaning can be seen. The so-called synchronization means that the process coordination between different individuals is consistent when they complete the same task. In the process of completing this task, if there is a problem in any step, the remaining steps must wait until the step in the problem is restored to normal before the task can be continued. To be more specific, for example, in military training, everyone steps in the same way. If one person falls down halfway, the others must stop and wait for the person to stand up before continuing to goose-step. When doing morning exercises, the movements of each child’s shoes are consistent. If one of them loses his shoelace and is tying his shoelace, the others have to wait until the person resumes doing morning exercises before continuing. The same is true of asynchrony, that is, the process coordination between different individuals is inconsistent when they complete the same task. So when it comes to goose-stepping, it’s not necessary for everyone to keep the same pace, just make sure that eventually everyone gets the goose-stepping task done

In other words, the concept of synchronization is the steps or individuals involved in the execution of a task. They are a kind of sequentially arranged dependence. When a step or individual has problems, the sequence cannot be adjusted and can only be waited. The concept of asynchrony, on the other hand, has no strict sequence and dependencies.In other words, synchronous asynchrony expresses order and dependencies

Note: It is important to mention that different individuals are doing the same task rather than doing different things

Now what is blocking non-blocking

The same is true in the literal sense. The so-called blocked, very simple, is stuck, is in the process of performing a task, stuck in the side, so that the task can not continue to execute. Non-blocking, on the other hand, is when you get stuck in the process of performing a task, not stuck, but doing something else. Similarly, in goose-stepping, if you encounter an obstacle, the way to block is to wait there until the obstacle is removed and then continue walking. If it’s non-blocking, people run off to do other things, wait until the obstacle is removed and then continue from that position. Notice that you have to come back and goose-step to complete the task

In other words, the concept of blocking and non-blocking is that the steps or individuals involved in the execution of a task, in the execution of a task, when one of the steps or individuals fails, the rest of the task is to wait and do nothing, or to run to do something else.That is, blocking non-blocking expresses the state of waiting

Also note that the concept of blocking non-blocking is based on the fact that a complete task is to be completed

Once again, to understand synchronous asynchronous blocking and non-blocking, you only need to understand its literal meaning. Many people get confused because the same word is used in the introduction of the five IO models, which makes it difficult to understand

Now that we understand the concept, let’s see how it relates to the five IO models

Synchronous asynchronous blocking non-blocking combined, there are four different combinations:

A synchronized block

Synchronous nonblocking

Asynchronous blocking

Asynchronous nonblocking

In combination with the five IO models explained above, we can see which IO models belong to the combination. Signal driven IO is used very little, so I will not explain it.

blocking non-blocking
synchronous BIO/Multiplexing IO NIO
asynchronous AIO

** Let’s take a look at BIO. ** As you can see from the IO model above, when a process makes a system call, the process is doing nothing, that is, blocked, stuck there. The process retrieves the data after the kernel has prepared it and copied it from kernel space to user space. It can be seen that the process is not finished until the end of the system call, the process is not finished until the kernel is ready for the data, that is, the completion of the process depends on the completion of the kernel operation, otherwise the call of the process will never be finished, that is to say, the process is synchronized with the kernel.So BIO is a synchronous blocking IO

** Back to NIO. The only difference between NIO and BIO is that the process does not wait for the kernel to process data. Instead, it polls all the time, which means it is non-blocking. The process also waits until the kernel processes the data and copies the data from the kernel space to the user space. The process then reads the data, indicating that it is a synchronous operation.So NIO is synchronous non-blocking IO

Multiplexing I/O ** In IO multiplexing, is essentially the same as in BIO. The difference is that the system provides a mechanism that can be used to process many requests at once. However, the state of the process and the process of obtaining data results are both blocked state and synchronized process.Multiplexing I/O is also synchronous blocking I/O

** Last AIO. As can be seen from the IO model above, when a process makes a system call, it no longer waits, but continues to execute, doing something else, which means it is not blocking. After the kernel is ready, the data is copied to user space. At this point, unlike in synchronous mode, the process does not have to wait for data to be copied into user space to terminate. In AIO mode, the process is already terminated because it is doing something else. To tell the process to read the data, you need to use a callback to tell the process to read the data.So AIO is asynchronous non-blocking IO

Third question: what is select/poll/epoll

First, select/poll/epoll is a mechanism. This mechanism is to achieve IO multiplexing. So how this mechanism works is that the kernel provides a way for a process to listen for multiple descriptors, and when descriptor data is ready, tell the process to come and read it. Select /poll/epoll are three different ways to do this

** The implementation of select. ** Manages different descriptors in one or more processes, each with a unique identifier. When one or more processes make a select system call to the kernel, the kernel begins to prepare the data. When the data is ready and returned to the corresponding descriptor in the corresponding process, select needs to iterate through all the descriptors in all processes, find the corresponding descriptor that initiated the request, and notify the process to read the data. Since the select-managed process is polled every time, time complexity is what we meanO(N)Complexity. In select mode, the system defines a maximum of 1024 descriptors that can be opened in a single process. Why 1024 is not explained here

To illustrate: the example of buying milk tea mentioned above. When there are different customers [process] to buy milk tea, and each customer needs to buy [descriptors] for his friends, but each customer can only buy [maximum 1024 descriptors] for his friends, no more. At this time, these customers write down their ID number, their friend's id number, and the line number on a card to the clerk. When the milk tea is ready, the shop assistant will find the corresponding customer ID number in this pile of cards according to the queue number, and then inform the customer to come to pick up the milk tea, and then send the milk tea to the corresponding friend. In this process, the clerk has to poll through each card each time, which is called O(N) complexity, and it's obviously time consuming

** Implementation of poll mode. Poll is implemented in the same way as SELECT, except that the poll method does not limit select to 1024 maximum descriptors

** Implementation of epoll mode. ** In epoll mode, unlike select and poll mode, when one or more processes make epoll calls to the kernel, the kernel registers a callback function for the descriptor of the process. After the kernel prepares the data, the process of the corresponding descriptor reads the data itself. You only need to notify one process at a time, which is the time complexityO(1)The complexity, obviously, is less thanO(N)Complexity is so much more efficient

To understand: it's the same example of buying milk tea. When these different customers come and buy milk tea, is no longer throw a stack of CARDS with the clerk, but the clerk give every customer a beeper, register a callback function 】 【 when drawn corresponding milk tea ready, will inform the customer came to get my milk tea 】 【 process, by this time the clerk's workload is the smallest, every time I just need to notice to a customer, It's so much more efficient

Select /poll/epoll select/poll/epoll

conclusion

There are five IO models in modern operating systems, of which the commonly used are BIO/NIO/Multiplexing IO and AIO, of which BIO/NIO/Multiplexing IO is synchronous blocking IO, NIO is synchronous non-blocking IO, AIO is asynchronous non-blocking IO

Synchronization Synchronization refers to whether the processes in the I/O process have the same coordination. Synchronization indicates that the processes are consistent, and asynchronous indicates that the processes are inconsistent. Blocking Non-blocking refers to the state representation of the process during I/O. Blocking means that the process has been waiting, and non-blocking means that it does not wait to perform other tasks

Select /poll/epoll are all three different ways of implementing Multiplexing IO. The biggest difference is in performance, where SELECT /poll is O(N) time complexity and epoll is O(1) time complexity with super efficiency

Welcome to my blog [Yang Jianyong’s personal blog http://yangjianyong.cn]