The login
registered
Write an article
Home page
Download the APP

Three mechanisms for IO multiplexing Select, Poll, and Epoll

The year of the ox that is like

Three mechanisms for IO multiplexing Select, Poll, and Epoll

The essence of I/O multiplexing is the mechanism by which the system kernel buffers I/O data so that a single process can monitor multiple file descriptors and, once a descriptor is ready (usually read or write), inform the program to read or write accordingly

Select, poll, and epoll are all IO reuse methods provided by the Linux API.

I believe you all understand the Unix five IO models, do not understand can => see here

[1] Blocking IO [2] blocking IO [3] Multiplexing IO [4] Multiplexing IO [5] Asynchronous IO – Asynchronous I/O

The first four types of IO can be categorized as synchronous IO, while SELECT, poll, and epoll are essentially synchronous I/O because they are responsible for reading and writing after the read and write event is ready, which means that the reading and writing process is blocked.

Compared with multi-process and multi-thread technology, the biggest advantage of I/O multiplexing technology is low system overhead, the system does not need to create processes/threads, and do not need to maintain these processes/threads, thus greatly reducing the system overhead.

Before introducing select, poll, and epoll, here are some basic concepts in Linux:

  • Today’s operating systems use virtual storage, so for 32-bit operating systems, the addressing space (virtual storage) is 4G (2 ^ 32). The core of the operating system is the kernel, which is independent of ordinary applications and has access to the protected memory space as well as all permissions to access the underlying hardware devices. In order to ensure that user processes cannot directly operate the kernel and ensure kernel security, the operating system divides the virtual space into two parts, one is the kernel space and the other is the user space.
  • Process switching In order to control process execution, the kernel must be able to suspend a process running on the CPU and resume execution of a previously suspended process. This behavior is called process switching. Therefore, it can be said that any process runs under the support of the operating system kernel, is closely related to the kernel, and process switching is very expensive.
  • A process blocks a running process. Because some expected events do not occur, such as system resource request failure, waiting for the completion of an operation, new data has not arrived, or no new work is to be done, the system automatically executes a Block primitive to change the running state to the blocked state. It can be seen that the blocking of a process is an active behavior of the process itself, so only the running process (which has received CPU resources) can be put into the blocking state. When a process is blocked, it consumes no CPU resources.
  • File descriptor A File descriptor is a computer science term, an abstract concept used to express a reference to a File. The file descriptor is formally a non-negative integer. In fact, it is an index value that points to the record table of open files that the kernel maintains for each process. When a program opens an existing file or creates a new file, the kernel returns a file descriptor to the process. In programming, some low-level programming tends to revolve around file descriptors. However, the concept of file descriptors is usually only applicable to operating systems such as UNIX and Linux.
  • Cache I/O Cache I/O is also called standard I/O. The default I/O operation of most file systems is cache I/O. In the Linux cache I/O mechanism, the operating system caches I/O data in the file system’s page cache. That is, the data is first copied to the buffer of the operating system kernel, and then copied from the buffer of the operating system kernel to the address space of the application program.

Select

Let’s look at the SELECT function first

int select(int maxfdp1,fd_set *readset,fd_set *writeset,fd_set *exceptset,const struct timeval *timeout);

Int maxfdp1 Specifies the number of file description words to be tested. The value is the maximum description words to be tested plus 1. Fd_set *readset, fd_set *writeset, fd_set * ExceptSet Fd_set can be understood as a set of file descriptors, that is, file handles. The middle three parameters specify the set of file descriptors that we want the kernel to test for read, write, and exception conditions. If you are not interested in a condition, you can set it to a null pointer. Const struct timeVal *timeout timeout tells the kernel how long it can take to wait for any of the specified set of file descriptors to be ready. Its timeVal structure is used to specify the number of seconds and microseconds for that time.

Int Number of ready descriptors if there are any, 0 if timeout, -1 if error

Select operation mechanism

The select() mechanism provides an fD_set data structure, which is actually an array of type long. Each element of the array can be associated with an open file handle (whether Socket handle, other file or named pipe or device handle). The association is done by the programmer. When select() is called, the kernel modifies the contents of the fd_set based on the IO state to tell the process that performed select() which Socket or file is readable.

In terms of flow, the USE of SELECT for IO requests is not much different from the synchronous blocking model, and even more inefficient with the additional operations of adding monitoring sockets and calling select. However, the biggest advantage of using SELECT is that the user can process I/O requests from multiple sockets simultaneously in a single thread. The user can register multiple sockets, and then continuously call select to read the activated socket, to achieve the purpose of processing multiple I/O requests in the same thread. In the synchronous blocking model, multithreading is necessary to achieve this goal.

Problems with the SELECT mechanism
  1. Every time I call select, I need to put thefd_setThe collection is copied from user state to kernel state iffd_setIf the set is large, then this is also expensive
  2. At the same time, each call to SELECT requires the kernel to iterate over everything passed infd_setIf thefd_setIf the set is large, then this is also expensive
  3. To reduce performance damage from data copying, the kernel pair is monitoredfd_setThe set size is limited, and this is controlled by macros, the size is immutable (limited to 1024)

Poll

Poll has a similar mechanism to SELECT and is not much different in nature from SELECT. Managing multiple descriptors is polling and processing is based on the state of the descriptor, but poll has no limit on the maximum number of file descriptors. That is, poll only addresses problem 3 above and does not address the performance overhead of problem 1,2.

Here is the PLL function prototype:

int poll(struct pollfd *fds, nfds_t nfds, int timeout); typedef struct pollfd { int fd; // The file descriptor to be detected or selected short events; // Events of interest on file descriptor fd short revents; Pollfd_t; pollfd_t; pollfd_t;Copy the code

Poll changes the way the set of file descriptors are described by using pollFD instead of SELECT’s fd_set structure, making the set of file descriptors supported by poll much more limited than select’s 1024

[Parameter Description]

Struct Pollfd * FDS FDS is an array of type struct Pollfd used to hold socket descriptors that need to be queried. The poll function does not empty the FDS array. A PollFD structure represents a monitored file descriptor and monitors multiple file descriptors by passing the FDS directive poll(). The events field of the structure is the event mask for monitoring the file descriptor, which is set by the user, and the Revents field of the structure is the event mask for the result of the action of the file descriptor, which is set by the kernel when the call returns

Nfds_t Total number of descriptors in NFDS record array FDS

Int Returns the number of ready read, write, or error descriptors in the FDS set, 0 for timeout, -1 for error;

Epoll

Epoll was formally proposed in the Linux2.6 kernel as an event-driven I/O mode. Compared with SELECT, epoll has no limit on the number of descriptors, uses a file descriptor to manage multiple descriptors, and stores the events of file descriptors that users care about into an event table in the kernel. Copy in user space and kernel space only needs to be done once.

The epoll related functions provided in Linux are as follows:

int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Copy the code

1. The epoll_create function creates an epoll handle with the size parameter indicating the number of descriptors the kernel is listening for. The call returns an epoll handle descriptor on success and -1 on failure.

2. The epoll_ctl function registers the event type to listen for. The four parameters are described as follows:

  • epfdRepresents an epoll handle
  • opIndicates the fd operation type. There are three types
    • EPOLL_CTL_ADD Registers a new FD to an EPFD
    • EPOLL_CTL_MOD Modifies the listening event of a registered FD
    • EPOLL_CTL_DEL Deletes a fd from an EPFD
  • fdIs the descriptor to listen on
  • eventRepresents the event to listen for

The epoll_event structure is defined as follows:

struct epoll_event {
    __uint32_t events;  /* Epoll events */
    epoll_data_t data;  /* User data variable */
};

typedef union epoll_data {
    void *ptr;
    int fd;
    __uint32_t u32;
    __uint64_t u64;
} epoll_data_t;
Copy the code

3. The epoll_wait function waits for ready events and returns the number of ready events on success, -1 on failure, and 0 on timeout.

  • epfdHandle is epoll
  • eventsRepresents a collection of ready events obtained from the kernel
  • maxeventsTells the kernel the size of Events
  • timeoutIndicates the timeout event to wait

Epoll is a poll modified by the Linux kernel for handling large numbers of file descriptors. It is an enhanced version of the SELECT/Poll multiplex IO interface under Linux, which can significantly improve the system CPU utilization of programs with only a small amount of activity in a large number of concurrent connections. The reason is that when it gets an event, it doesn’t need to walk through the entire set of listened descriptors, just the set of descriptors that are asynchronously woken up by kernel IO events and added to the Ready queue.

In addition to providing Level Triggered I/O events like SELECT /poll, epoll also provides Edge Triggered I/O events, which makes it possible for user-space programs to cache I/O state. Reduce epoll_WAIT /epoll_pwait calls to improve application efficiency.

  • Horizontal triggering (LT) : The default working mode, which means that when epoll_WAIT detects that a descriptor event is ready and informs the application that the event is not processed immediately; This event is notified again the next time epoll_wait is called
  • Edge triggering (ET) : When epoll_WAIT detects that a descriptor event is ready and notifies the application, the application must process the event immediately. If not, the event will not be notified again the next time epoll_wait is called. (Until you do something that causes the descriptor to become unready, that is, edge triggers are notified only once when the state changes from unready to ready.)

LT and ET are supposed to be used for pulse signals, so it might be more graphic to explain it. Level is triggered as long as it is at a Level, while Edge is triggered when it is rising or falling. For example, 0->1 indicates Edge, and 1->1 indicates Level.

ET mode greatly reduces the triggering times of epoll events, so the efficiency is higher than LT mode.

conclusion

Select,poll, and epoll

select poll epoll
Mode of operation traverse traverse The callback
The underlying implementation An array of The list Hash table
IO efficiency Each call is linear traversal, order n time. Each call is linear traversal, order n time. When the fd is ready, the system registered callback function will be called. Put the ready FD into the readyList, time complexity O(1).
Maximum number of connections 1024 (x86) or 2048 (x64) There is no upper limit There is no upper limit
Fd copy Each time you call SELECT, you need to copy the FD collection from user state to kernel state Each time poll is called, the FD collection needs to be copied from user state to kernel state When epoll_ctl is called, copy it into the kernel and save it. After that, epoll_wait does not copy it

Epoll is Linux’s current model of choice for large-scale network concurrency development. Far outperforms select and poll in most cases. The popular high-performance Web server Nginx officially relies on the efficient network socket polling service provided by EPoll. However, in cases where concurrent connections are not high, the multithreaded + blocking I/O approach may perform better.


Since SELECT, poll, and epoll are all concrete implementations of I/O multiplexing, they are also products of different historical periods

  • Select was implemented in BSD in 1984
  • Polling was implemented 14 years later in 1997, but it was not efficiency that took so long, but the hardware of that era was so weak that a server handling more than 1,000 links was a godlike existence, and select had been satisfied for a long time
  • Epoll was implemented by Davide Libenzi in 2002

Recommended readingMore highlights

  • I/O model: Synchronous, asynchronous, blocking and non-blocking. I/O model: synchronous, asynchronous, blocking and non-blocking. Ape Code Read 63,005 Comments 55 upvotes 283
  • Linux I/O mode and select, poll, epoll Synchronous I/O and asynchronous I/O blocking AND non-blocking I/O. Different people give different answers in different contexts. The… Lxqfirst read 1,272 comments 0 likes 45
  • The background of this article is network IO in the Linux environment. 1. Before I go any further, A few concepts need to be explained first. Faunjoe Read 1,486 Comments 0 Likes 9
  • Big Select, Poll, Epoll reproduced: https://cloud.tencent.com/developer/article/1005481 mentioned s… Faunjoe Read 389 Comments 0 Likes 6
  • Blocking IO multiplexing :blocking IO multiplexing :blocking IO multiplexing :blocking IO multiplexing :blocking IO multiplexing :blocking IO… Read 4,727 Comments 5 likes 26