Select one.

The initial version of the IO multiplexing implementation has the following function signature:

int select(int nfds,
           fd_set *restrict readfds,
           fd_set *restrict writefds,
           fd_set *restrict errorfds,
           struct timeval *restrict timeout);
Copy the code

Before introducing select in detail, there are several concepts that need to be explained:

  • Fd: file descriptor;

The operating system maintains a file descriptor table for each process, which records the files opened and created by the process. The file descriptor exists as the index of this table. Intuitively speaking, a process can uniquely determine an open file according to the file descriptor. The file descriptor itself is a non-negative integer starting from 0;

  • Fd_set: set of file descriptors

An object that records a set of file descriptors can essentially be thought of as an n-bit binary number that can record up to n FDS. For example, when n = 8, fd_set can be marked as: 00010011; The use of fd_set involves the following API: int FD_ZERO(int fd, fd_set *fdset); Int FD_CLR(int fd, fd_set *fdset); Fd_set (int fd, fd_set *fd_set); Int FD_ISSET(int fd, fd_set *fdset); // Check whether a bit of fd_set is 1

Select ();

  • Readfds: listen for readable events in fd;
  • Writefds: listens for writable events of fd;
  • Errorfds: listen for abnormal events of fd.
  • NFDS: when NFDS = n, the kernel listens only on the first nfd of the three collections.
  • Timeout: indicates the blocking duration when the select is invoked. If none of the file descriptors are ready, the calling process is blocked until a descriptor is ready or a timeout is exceeded, and return. If the timeout argument is set to NULL, it blocks indefinitely until a descriptor is ready; If the timeout parameter is set to 0, it returns immediately without blocking.

The select return value represents the total number of ready FDS;

Select internal execution logic looks like this:

int count = 0; for (int fd : fd_set) { if (! FD_ISSET(fd, fd_set)) continue; FD_CLR(fd, fd_set); if (isReady(fd)) { count++; FD_SET(fd, fd_set); berak; } } return count;Copy the code

After the user process returns from select, it only needs to check FD_ISSET which bits in fd_set are 1 to determine which file descriptors are ready.

In summary, the main costs of SELECT are as follows:

  • Copy three fd_sets to the kernel;
  • Loop through three fd_sets to find fD-ready or timeout;
  • The user process can not directly get the ready FD after the select return, but also needs to go through the fd_set judgment;

In addition, the select call is limited in the number of FDS it can listen on because of the length of the Fd_set.

Here is an example of how to use select:

The poll.

Poll is not fundamentally different from select, and the purpose of poll is to solve the problem that select can monitor fewer FDS at the same time.

The number of FDS that can be listened on at the same time is limited by sizeof(fd_set). Poll uses an array to store all FDS that can be listened on.

int poll(struct pollfd *fds, 
         nfds_t nfds, 
         int timeout);
Copy the code

Forget about NFDS and timeout, let’s focus on FDS;

FDS is an array of type pollfd *. Pollfd is defined as follows:

struct pollfd { int fd; /* The file descriptor to listen on */ short events; /* Listen for more than one event, such as readable and writable events */ short revents; /* Events that actually occur are filled by the kernel */};Copy the code

The execution process inside poll is similar to that of SELECT, which will not be described here.

After the poll call returns, the Revents of each PollFD object in the PollFDS array can be checked to determine whether the corresponding FD data is ready to perform the corresponding read and write.

To sum up, poll does not optimize the three main costs before and after select execution except solving the problem of “select can monitor fewer FDS at the same time “.

Three epoll.

Compared to Poll, ePoll is the real improver of Select, greatly optimizing the three major time-consuming operations of SELECT.

Epoll is not a single system call. The corresponding epoll model actually consists of three functions:

epoll_create

The function signature is as follows:

int epoll_create(int size);
Copy the code

This call creates an epoll instance and returns the fd corresponding to that instance;

Epoll instances are internally composed of two collections:

  • Listener list: all FDS to be monitored, data structure is a red-black tree;
  • Ready list: all data ready FD, data mechanism is linked list;

epoll_ctl

The function signature is as follows:

int epoll_ctl(int epfd,
              int op,
              int fd,
              struct epoll_event *event);
typedef union epoll_data {
    void *ptr;
    int fd;
    __uint32_t u32;
    __uint64_t u64;
} epoll_data_t;

struct epoll_event {
    __uint32_t events; /* Epoll events */
    epoll_data_t data; /* User data variable */
};             
Copy the code

Parameter Description:

  • Epfd: the fd of the epoll instance, returned by the epoll_create function;
  • Op: Operation to be performed on a FD, such as adding an event to the FD or deleting all listening events for the FD;
  • Fd: the file descriptor to listen on;
  • Event: the specific event to listen for in a FD;

The return value 0 or -1 indicates whether the operation succeeds.

Epoll_ctl adds the file descriptor fd to the epoll instance’s listener list, sets a callback function for fd, and listens for the event. When an event occurs on a FD, the callback function is called to add the FD to the ready queue of the epoll instance.

epoll_wait

Its function signature is as follows:

int epoll_wait(int epfd, 
               struct epoll_event *events,
               int maxevents, 
               int timeout);
Copy the code

This is the main function of the Epoll model and is equivalent to select.

Parameter Description:

  • Events is an array that stores the file descriptor of ready state, and its space is applied by the caller. That is, the kernel is only responsible for filling data into events and will not automatically apply for space, so NULL cannot be passed in.
  • Maxevents specifies the size of events;

The return value represents the number of ready descriptors stored in Events, up to and including maxEvents.

Optimization brought by epoll

Here’s how epoll resolves three time-consuming operations in SELECT:

  1. The epoll_ctl call allows fine-grained control of each FD to be listened on, rather than having to make a copy of each FD to the kernel (resulting in a large number of duplicate copies).
  2. For select and poll calls, the kernel loops to find ready FDS, whereas epoll is event-driven. When an FD is ready, it is automatically added to the ready list by the registered callback function, so epoll only needs to listen for whether the ready set is empty.
  3. Similar to select and poll, events are passed in to inform the caller (user process) of which FDS are ready. Select and poll require a full loop operation to find which FDS are ready. Epoll, on the other hand, delivers all ready FDS to the caller without the need for the caller to filter them. This solves the time-consuming operation of “the user process cannot directly get ready FDS after the select is returned, and it needs to go through the FD_set judgment by itself”.

Horizontal trigger and edge trigger

  • LT (Level Trigger) : A notification is triggered when the file descriptor is ready, and if the user program does not read/write all the data in one go, it is also notified by a readable/writable signal when listening again.
  • Edge Trigger (ET) : Notifies a descriptor once and never again when it becomes ready.

Select supports only horizontal triggering, epoll supports both horizontal triggering and edge triggering

Difference: Edge triggering is more efficient, reducing the number of times the event is fired repeatedly

Why does edge triggering have to use non-blocking I/O?

Each time data is read through the read system call, a maximum of bytes of the buffer size can be read; If a file descriptor receives more data than the size of the buffer at once, it needs to be read multiple times before it is fully read.

Select can use blocking I/O. After we get all the readable file descriptors by select, we iterate over each file descriptor and read the data once. These file descriptors are all readable, so even if read is blocking I/O, we can read the data without blocking forever. Select is horizontally triggered, so if the first read does not read all of the data, the next call to select will still return the file descriptor, which can be read again.

Select can also use non-blocking I/O. When iterating over a readable file descriptor, read is called multiple times using the for loop until all data is read (returning EWOULDBLOCK). Doing so makes one more read call, but reduces the number of SELECT calls;

In epoll edge-triggered mode, the operating system notifies you only when the readable/writable state of the file descriptor changes.

Therefore, if epoll’s edge-triggered mode is used, non-blocking I/O must be used when a notification is received, and read or write must be called repeatedly until EWOULDBLOCK is returned, and then epoll_wait is called for the next notification from the operating system.

If all data is not read/written at one time, the status of the file descriptor does not change in the view of the operating system and no notification will be issued. Invoking epoll_wait will make the file descriptor wait forever, and the server will also wait for the response from the client, and the business process cannot complete.

The advantage of this is that each call to epoll_wait is valid — the data is fully read and written, waiting for the next notification. In horizontal trigger mode, if epoll_wait is called without data being read/written, epoll_wait will return with another notification. So edge triggering can significantly reduce the number of times events are triggered.

Why can’t epoll’s edge trigger mode use blocking I/O? Obviously, edge-triggered mode requires cyclic reading/writing of all data in a file descriptor. If you use blocking I/O, you are bound to block on the last call (with no data to read/write) and fail to finish properly

The resources

  • Imageslr.com/2020/02/27/…