What is IO?

We all know that in the Unix world, everything is a file, and what is a file? A file is just a string of binary streams, whether socket, FIFO, pipe, terminal, to us, everything is a file, everything is a stream, in the process of information exchange, we are on these streams of data receiving and receiving operations, referred to as I/O operations Output), read data into the stream, system call read, write data, system call write, but then again, there are so many streams in the computer, how do I know which stream to operate on? Do this is the file descriptor, known as fd, a fd is an integer, so for this integer operations, that is, to the operation of the this file (flow), we create a socket, through the system call returns a file descriptor, then the rest to the operation of the socket will be converted to operation, not of the descriptor Again, this is a layered and abstract idea.

Typically, a complete IO in a user process is divided into two phases:



Userspace <————-> kernel space,

Kernel space <————-> Device space,



The kernel space stores the kernel code and data, while the user space of the process stores the code and data of the user program. Both the kernel space and user space are in virtual space. Linux uses two levels of protection: level 0 for the kernel and level 3 for user programs.

Operating system and driver running in kernel space, applications run in user space, both cannot simply transfer data using a pointer, because Linux using virtual memory mechanism, it should be done through the system call request assistance from the kernel to IO movements, the kernel will be for each of the IO equipment maintenance data in a buffer, user space could be swapped out, When kernel space uses user space Pointers, the corresponding data may not be in memory

For an input operation, after the process IO system call, the kernel will first check whether there is corresponding cache data in the buffer, and then read from the device, because the device IO speed is generally slow, need to wait, kernel buffer data is directly copied to the process space.

Therefore, a network input operation usually consists of two different stages:

(1) Wait for network data to reach the nic – > read to the kernel buffer

(2) Copy data from the kernel buffer – > user space

IO consists of memory IO, network IO and disk IO. Usually, WE refer to the latter two

1. Blocking IO (blocking I/O)

A was fishing by the river with A fishing rod, and he was waiting in front of the rod. He did not do other things while waiting, and was very focused. Only when the fish hooked, to end the action and so on, fish up.

The system call waits on all sockets until the kernel is ready for the data, blocking by default.



In fact, the fishing rod in our example is the file descriptor. This model is our most common, and the program calls are consistent with the basic programs we write.

A program’s read must be executed after write. When write is blocked, read cannot be executed and is in a waiting state.

2. Noblocking I/O

B is fishing by the river, but B don’t want to spend all the time on fishing, waiting for the fish bait, in this time period B also doing other things (read after a while, a moment to read the newspaper, and go to the other people fishing, etc.), but B at the time of doing these things, every once in a fixed time to check whether fish bait. Once you detect a fish on the hook, stop what you’re doing and bring it in.



In fact, B is checking whether the fishing rod has fish, which is a polling process.

Each time the customer asks if the kernel has data ready, that is, if the file descriptor buffer is ready. When a datagram is ready, the operation of copying the datagram is performed. Instead of blocking the program when no datagrams are ready, the kernel simply returns an unready signal, waiting for the next round of the user program.

However, round hunting is a big waste of CPU and is usually used only in certain scenarios.

3. Signal blocking I/O

C also fishes by the river, but different from A and B, C is smarter. He puts A bell on the fishing rod. When A fish hooks, the bell will be touched and C will catch the fish.



In the signal-driven IO model, the application tells the kernel: When the datagram is ready, send me a signal, capture the SIGIO signal, and call my signal-processing function to get the datagram.

IO multiplexing (I/O multiplexing)

D also fishes by the river, but D has a good living standard. D takes a lot of fishing rods and waits for a lot of fishing rods at one time. D constantly checks whether any fish are hooked on each rod. Increased efficiency and reduced waiting time.



I/O multiplexer has a select function that takes a set of file descriptors as an argument. The select function loops through these file descriptors and processes the file descriptor when it is ready.

Select is only responsible for waiting, recvfrom is only responsible for copying. I/O multiplexing is blocking I/O, but it can block listening for multiple file descriptors, so the efficiency is higher than that of blocking I/O.

5. Asynchronous I/O

E also wanted to fish, but E had something to do, so he hired F and asked F to wait for the fish. Once a fish hooked, F would call E, and E would hook the fish up.



When an application calls aiO_read, the kernel returns the datagram content while giving control back to the application process, which continues to do other things, in a non-blocking state.

When a datagram is ready in the kernel, the kernel copies the datagram into the application, returning the function handler defined in AIO_read.

Few Linux systems support this, and the IOCP for Windows is the model.

As can be seen, the blocking degree: blocking IO> non-blocking IO> multi-channel switching IO> signal-driven IO> asynchronous IO, the efficiency is from low to high.