Introduction to the

As we all know, the characteristics of BIO and NIO, like the methods provided by Java programs, are supported by the operating system at the bottom. If we want to know more about the characteristics of BIO and NIO, we need to combine with the principles. Here is my own understanding of the IO transition provided by the operating system. Before THE IO changes, I first let you take a look at a blog by Lao Qian (the author of Redis in-depth adventure core principles and application practice), the above GIF is really a unique blog address;

Epoll evolution

  1. First, there is a file descriptor (socket has a file descriptor). At the beginning, there is only one read instruction. The socket is blocked at this time, and each socket needs a thread to read. BIO model

  2. In the second period, the kernel supported NIO, that is, you could call read; If no relevant data is returned, it will not block. So, we can poll all file descriptors through a single thread to see if there is any data to read from the file descriptor. If you have 1000 file descriptors, you need to call read 1000 times. If you want to query a file descriptor, you need to call the kernel function once. However, this involves switching between user and kernel state.

  3. At this point, the kernel goes ahead and adds a system call called SELECT, and if you have, say, 1,000 file descriptors, and you pass those 1,000 file descriptors to it, it monitors it, and when it finds ready, it returns. It is still synchronously non-blocking. For example, let’s say I have 1000 file descriptors, and I pass those 1000 file descriptors to the kernel function SELECT (called once). Let’s say there are 50 file descriptors that are readable, so I only need to call SELECT once to know that there are 50 file descriptors that are readable, and then I call read to read them. Now, solving 1000 calls to read becomes 1 call to SELECT + 50 calls to read; This is also multiplexing, but there is a problem: the kernel mode does not trust the user mode, so it needs to pass the parameter copy: The kernel cannot trust Pointers to any space, and must validate the data to which Pointers to userspace point. If only validation is done without copying, then the kernel is always threatened with subsequent changes to the data in the space by other threads. So, you have to make a copy. So, these 1000 file descriptors need to be copied into the kernel state first; Low efficiency;

  4. In order to solve this problem, this paper puts forward the idea of a Shared space, in this space, which can go to visit the kernel configuration, also can go to access user mode, so the epoll concept, has produced a file descriptor, first on a Shared space, the new file descriptor, will give the red-black tree within the register, so, When a read event is found, it is placed in a linked list. At this point, the read(file descriptor) can be used to read data. Therefore, epoll is not AIO, because when it knows about read events, it also needs to call read(file descriptor) to read them. AIO asynchrony: call read() once and forget about it. When a message comes in, just call a callback function instead of calling read().

Poll: The implementation of poll is very similar to that of SELECT, except that the set of file descriptors is different. Poll uses linked lists at the bottom, so it does not have the disadvantage of 1024 compared to SELECT. However, it also has the following disadvantages:

1 Each time poll is called, the set of file descriptors needs to be copied from user state to kernel state, which is expensive

2 Each time poll is called, the kernel needs to iterate over the file descriptor passed in, which is also expensive

Note: IO operations are divided into two steps: the first step is to know which file descriptors are readable from the kernel. The second step is to read files from the kernel into the program.