select

Function & Structure

// The return value indicates the number of fd ready events. If there are more than one FD ready event, it will be counted more than once
int select(int maxfdpl, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
Copy the code

Parameters that

  • Maxfdpl: Maximum file descriptor +1, whose value must be set to ratioThree file descriptor setsThe containedThe maximum file descriptor is 1 larger. Is actually the number of file descriptors to check, with array subscripts starting at 0. With this value, you don’t need to check all the descriptors.
  • Readfds: A collection of file descriptors used to check for input readiness
  • Writers: A collection of file descriptors used to check whether output is ready
  • Exceptfds: Set of file descriptors used to detect whether exceptions occur
  • Timeval: Controls the blocking of select(). When NULL, select() will always block

The timeval structure

// Select () does not block if both values are 0.
// Select () has a upper limit of wait time
struct timeval{
  time_t		tv_sec; / / SEC.
  suseconds	tv_usec;// Subtlety level precision
}
Copy the code

The fd_set structure

// The number of bits per unsigned long can be represented by a bitmap. One bit can be represented by a single digit, such as eight fd digits
#define __NFDBITS (8 * sizeof(unsigned long))                
// The default FD_SETSIZE is 1024. To change this, you must modify the header definition in glibc and then recompile it, but the number of links is generally too high and using the later epoll is better
#define __FD_SETSIZE 1024                                          
// Suppose a total of __FD_SETSIZE(default: 1024) fd is recorded, how many long integers are required
#define __FDSET_LONGS (__FD_SETSIZE/__NFDBITS)     

typedef struct {
  // Use the long array
    unsigned long fds_bits [__FDSET_LONGS];                 
} __kernel_fd_set;
typedef __kernel_fd_set   fd_set;
Copy the code

The return value

  • 1: saidError occurred
  • 0: indicates before any file descriptor is readySelect () is already called timeout
  • Positive integer: indicates that one or more file descriptors have reached the ready state, and the return value indicates how many FDS are ready. If a file descriptor is specified in all three collections, the return value will be +3, that is, will beCalculate many times .

The advantages and disadvantages

advantages

  • High portability, basically all kinds of mainstream OS support

disadvantages

  • Only horizontal triggering is supported

  • Data needs to be copied from user space (programs) to kernel space

  • Each time three data sets are sent to the kernel, the process first resets the file descriptor set to the FD to listen on

  • When fd is very large, all three descriptor sets need to be polled, which is CPU intensive

  • When the select returns, the program does not know which FDS are ready, but only how many are ready. It is up to the process to traverse and determine the set that has been passed

poll

Function & Structure

/ / call
int poll(struct pollfd fdarray[], nfds_t nfds, int timeout);

// Contains the structure of the file descriptor
struct pollfd{
  int 	fd;				// its description symbol fd
  short events;		// Subscribed events
  short revents;	// Response events
}
Copy the code

Parameters that

call

  • Fdarray: An array of pollFD structures
  • NFDS: Specifies the number of fDARray elements
  • Timeout: Indicates how long you are willing to wait
    • -1: Wait forever
    • 0: do not wait
    • Greater than 0: wait for timeout(ms)

A structure containing fd

  • Fd: corresponds to the FD number
  • Events: interested/subscribed events
  • Revents: Indicates which events are triggered, which is modified by the kernel

The return value

  • 1: failure
  • 0: No event occurs before the timeout
  • Greater than 0: the specified descriptor is ready or timeout expires.

The advantages and disadvantages

Each FD has its own PollFD structure, which divides interested events and triggered events into Events and Revents. The value of events tells the kernel which events of the descriptor we care about; When a PARTICULAR FD is triggered by an event, the revents data is modified by the kernel without interference, so it is not necessary to reset the FD collection for each call like select.

advantages

  • There is no need for the process to redo the FD collection every time

  • There is no limit to array size

disadvantages

  • Only horizontal triggering is supported

  • As with SELECT (), data is copied back and forth between user space and kernel space

  • High portability but not as high as select()

  • Not suitable for a large number of FD, fd performance is not as good as epoll

  • Like select(), when poll returns, the program does not know which FDS are ready, but only how many are ready. It is up to the process to traverse and judge the passed collection itself

“Usually the program calls these system calls (select() * poll()) and checks for the same set of file descriptors, but the kernel does not record them.”

The latter signal-driven I/O and epoll both enable the kernel to record file descriptors of interest in the process, eliminating the performance scaling problems of select() and poll() by this mechanism. This scheme is extended by the NUMBER of I/O events that occur, independent of the number of file descriptors checked, and signal-driven I/O and epoll provide better performance when large numbers of file descriptors need to be checked.

Signal drives I/O

The latter two signals drive I/O and epoll, unlike select and poll. Select and poll do not allow the kernel to remember the FD that the process is interested in, so each time the interested FD is copied from user space to the kernel air. Wasting CPU is very CPU time consuming, and each call returns with all FDS checked to see which FDS triggered the event, which is why they are not suitable for a large number of FD operations.

Signal-driven I/O, process request kernel: sends a signal to the process when there is an I/O operation to be performed on the file descriptor.

The detailed steps follow

steps

  • 1. Install a signal processing routine for the notification signal sent by the kernel, which by default is SIGIO.

  • 2. Set the owner of the file descriptor, that is, the process or process group that receives the notification signal when the file descriptor performs I/O. The calling process is usually set to owner. This can be done via the F_SETOWN operation of FCNTL () :

    fcntl(fd, F_SETOWN, pid);
    Copy the code
  • 3. Set the O_NONBLOCK flag to enable it to become a non-blocking I/O

  • 4. This and step 3 can be combined into one operation by turning on the O_ASYNC flag to become signal-driven I/O, as they both require the F_SETFL of FCNTL ()

    flags = fcntl(fd, F_GETFL);
    fcntl(fd, F_SETFL, flags | O_ASYNC | O_NONBLOCK);
    Copy the code
  • 5. After executing these tasks, the calling process can perform other tasks. Then, if a corresponding FD event is triggered, the kernel will send a signal to the process to notify it through the set signal routine

  • 6. Signal-driven I/O provides edge-triggered notification, which means that once the process is notified that I/O is ready, it should execute as much I/O as possible (read as many bytes as possible). If FD is set to non-blocking, it means that it needs to loop I/O system calls until the point of failure. The error code is EAGAIN or EWOULDBLOCK.

Some of the features

Linux2.4 or earlier can be used on sockets, terminals, pseudo-terminals, and other specific types of devices. Linux2.6 is available for pipes and FIFO. Since Linux2.6.25, it is also available on inotify file descriptors.

  • Install signal processing routines before starting signal-driven I/O: Since the default behavior of receiving SIGIO signals is to terminate the process, so needs to install the signal processing routines for SIGIO signals before the signal driver I/O is received. If the signal driver /IO is started first, the process may be terminated before the routine is installed.

    On some other UNIX implementations, signal SIGIO is ignored by default.

When to send the I/O ready signal

Terminal and pseudo terminal

Generating new input generates a signal even if the previous input has not been read. The terminal appears the end of the file, at this time will send input ready signal (pseudo-terminal does not). The terminal is not ready for output, and there will be no signal when the link is disconnected. In Linux, version 2.4.19 provides an “output ready” signal to the slave end of the pseudo-terminal, which is generated when the pseudo-terminal master reads the input.

Pipes and FIFO

Pipeline and FIFO reading end, signal generation situation:

  • Data is written to the pipe (even if unread input already exists)
  • The write end of the pipe is closed

For the write end of a pipe or FIFO, signals are generated in the following cases

  • The read operation on the pipe increases the amount of free space in the pipe so that PIPE_BUF bytes can now be written without blocking
  • The read end of the pipe is closed

The socket

Signal-driven I/O is suitable for datagram sockets under UNIX and the Internet.

  • An input datagram arrives at the socket (even if there are already unread datagrams waiting to be read)
  • An asynchronous error occurred on the socket

Signal-driven I/O is suitable for streaming sockets under UNIX and the Internet.

  • The listening socket received a new connection
  • When the TCP Connect () request completes, the TCP initiator enters the ESTABLISHED state. For UNIX domain sockets, a similar situation does not signal.
  • New input received on the socket (even if unread input already exists)
  • The socket peer either closes the write link (partially closed) with shutdown() or closes it completely with close()
  • Output ready on the socket (write ready event triggered if the send buffer has space, for example)
  • An asynchronous error occurred on the socket

Inotify file descriptor

A signal is generated when the notify file descriptor becomes readable, that is, when an event occurs on one of the files monitored by the inotify file descriptor.

The problem

Signal queue overflow processing

The number of real-time signals that can be queued is limited. When the limit is reached, the notifications revert to the default SIGIO signal, which indicates that the signal queue has overflowed. In this case, information about the I/O events occurring on the FD is lost because the SIGIO signal is not queued, the SIGIO signal processing routine does not accept the SIGinfo_T struct parameter, and the SO signal processing routine cannot determine which FD generated the signal.

One way to solve the above problem is to increase the limit of the number of factual signals to reduce the possibility of signal overflow, but it cannot be completely ruled out.

Programs that use F_SETSIG to establish a real-time signal as an “I/O ready” notification must install processing routines for SIGIO. If a SIGIO signal is sent, the program can get all the real-time signals in the queue through sigWaitInfo () and temporarily switch to Select () or poll() to get the remaining list of file descriptors where I/O events occurred.

The advantages and disadvantages

Advantages:

When an event is triggered, the kernel actively notifies the process by sending a signal

There is no need for the user process to copy the FD array into the kernel

Disadvantages:

Only edge triggering is supported

Poor signal handling can lead to process problems.

epoll

Epoll Like I/O multiplexing and signal drivers, EPOLL (Event Poll) of Linux can check I/O readiness on multiple file descriptors.

Flow & Function & Structure

Epoll is unique to Linux and has been added since version 2.6. The core data structure of the Epoll API is called an epoll instance. It is associated with an open file descriptor, which is not used for I/O operations. These kernel data structures serve two purposes:

  • Record the list of file descriptors of interest declared in the process — interest List
  • Maintains a list of file descriptors in the I/O ready state — ready List

(Ready List is a subset of interest List)

Using the process

  • The system calls epoll_create() to create an epoll instance and returns the file descriptor representing the instance (for each fd opened, a corresponding epoll instance is created to represent that FD).

    // size only specifies the initial size of the internal data structure. This parameter is ignored after 2.6.8
    int epoll_create(int size);
    Copy the code

    When this fd is no longer used, close it by close(). When all file descriptors associated with epoll instances are closed, the instance is destroyed and the associated resources are returned to the system (multiple FDS may reference the same epoll instance, This is due to calls to functions like fork() or dup().)

  • The system calls epoll_ctl() to operate on the list of interests associated with the epoll instance. You can use epoll_ctl() to add new descriptors to the list or delete existing file descriptors from the list. And modify the bitmask that represents the event type on the file descriptor (add, remove interested lists, interested events).

    int epoll_ctl(int epfd, int op, int fd, struct epoll_event *ev);
    Copy the code

    The fd parameter indicates the FD to be modified, which can be a pipe, FIFO, socket, POSIX message queue, inotify instance, terminal, device, or even the FD of another Epoll instance.

    Event Indicates the structure of an event

    struct epoll_event{
      uint32_t events;		// epoll Events (bit mask)
      epoll_data_t data;	// User data
    }
    
    typedef union epoll_data{
      void 		*ptr;	// 
      int 		 fd;
      uint32_t u32;
      uint64_t u64;
    }epoll_data_t;
    Copy the code
  • The system call epoll_wait() returns a member of the ready list associated with the epoll instance.

    int epoll_wait(int epfd, struct epoll_event * evlist, int maxevents, int timeout);
    Copy the code
    • The array of structures pointed to by the evList argument returns information about the ready-to-state file descriptor

    • The Events field returns the mask of the events that have occurred at the descriptor

    • The timeout argument determines the blocking behavior of epoll_wait()

      • -1: blocks all the time
      • 0: performs a non-blocking check
      • Greater than 0: blocks at most timeout milliseconds until an event occurs or a signal is captured

      After a successful call, epoll_wait() returns the number of elements in the evlist array. Returns 0 if no file descriptor is in a ready state within the timeout timeout interval. −1 is returned when an error occurs, and an error code is set in errno to indicate the cause of the error.

The event

For each file descriptor examined by epoll, bitmasks can be specified to represent events of interest that are closely related to the bitmasks used by poll()..

advantages

Main advantages:

  • Epoll performs much better than select() and poll() when examining large numbers of file descriptors.

  • Epoll supports both horizontal and edge triggering. In contrast, SELECT () and poll() only support horizontal triggering, and signal-driven I/O only supports edge triggering.

Epoll performs almost as well as signal-driven I/O, but ePoll has some advantages over signal-driven I/O:

  • Can avoid complex signal processing processes (such as signal queue overflow processing)

  • Flexibility to specify the type of event we want to check for (such as checking sockets for read ready events, write ready events, or both)

conclusion

About horizontal and edge triggering

Level trigger

Events are responded to when there is data to read/space to write in the buffer.

A problem

With horizontal triggering, write events are triggered as long as the socket buffer is writable, so one solution is to register write events only when data is written and cancel the write events when data is written.

A disadvantage of horizontal triggering is that if there is always data in the data area but the process has not finished processing the data, events will continue to fire and consume performance.

Edge trigger

An edge trigger is triggered by the creation of a new event.

Your event has been feedback to the process, but if the process data on the event didn’t finish reading, the remaining data will be in the buffer, will no longer trigger events, unless there is a new data arrives (unlike level trigger, level for as long as there is data, can trigger events), so you can read to the previous data, So this is also an issue of edge triggering, when an event occurs, read as much data as possible, in case the data gets stuck in the buffer and the process doesn’t read the complete data and can’t handle the request. The client is likely to request a timeout.

If the development team is strong enough, use edge triggers for development. Otherwise the performance of using horizontal triggering is sufficient.

In the case of large number of connections and high concurrency, edge triggering can better reflect the performance advantages

The characteristics of

Select () and poll() are more portable across OS than signal-driven and epoll(), but are much less efficient when there are too many FDS.

Poll and SELECT support only horizontal triggering

Select () and poll() operate in a similar way, with each check the process actively copying the array into the kernel

Select passes an array of three events (read, write, exception), and each time it is passed to the kernel, the original data needs to be emptied

Poll copies interested FDS into the kernel each time, and uses events and revents to separate interested events from fired events so that you don’t need to initialize the array each time.

Both SELECT and poll are actively checked because the calling kernel does not record FDS and events of interest to the process. The latter two signals drive I/O and epoll, and both enable the kernel to remember the FDS and events they are interested in, so there is no need to copy the array every time.

Signal drive only supports edge triggering

As mentioned above, signal-driven and ePoll are similar to receiving in that the kernel notifies the process, rather than the process checking the corresponding file descriptor for events every once in a while. When there are too many file descriptors, the performance advantage is significant, since FDS do not need to be copied into the kernel, and the kernel actively notifies the process.

Epoll supports both horizontal triggering and edge triggering

Epoll supports both horizontal triggering and edge triggering. The mechanism of epoll is similar to signal-driven.

The total contrast

Each time the array is copied to the kernel, the kernel will return the array after modifying the parameters. In the case of signal driven and epoll, after the process adds the interested FD and the corresponding event, each notification is actively notified by the kernel, because the kernel knows what needs to be notified and does not need the process to actively inquire and copy the array from the user process to the kernel. Therefore, the performance of signal driven and epoll is significantly higher than that of SELECT and poll.