preface

In recent years, the concepts of “asynchronous” and “non-blocking” have been widely used in server-side application development. A lot of times people like to describe them together, so many people may confuse the two words. This article tries to explain the tension between the two from a Linux I/O perspective.

This article covers the following:

  1. LinuxI/OBasic knowledge;
  2. I/OModel meaning and existing several categories:
    1. blockingI/O;
    2. Multithreaded blockingI/O;
    3. non-blockingI/O;
    4. I/Omultiplexing:select/poll/ epoll;
    5. asynchronousI/O
  3. libuvHow to solveI/OThe problem.

In addition, the examples involved in this article have been hosted on GITHUB, welcome to download the trial run.

Linux I/O basics

Start with the example of reading a file

Now there is a requirement to read the file through a shell script. I’m sure most of you can do it right away:

$ cat say-hello.txtCopy the code

Now we’re asking for C to do the same thing in more detail.

#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

int main(int argc, char const *argv[])
{
    char buffer[1024];
    int fd = open("say-hello.txt", O_RDONLY, 0666);
    int size;

    if (fd < 0) {
        perror("open");
        return 1;
    }

    while ((size = read(fd, buffer, sizeof(buffer))) > 0)
    {
        printf("%s\n", buffer);
    }

    if (size < 0) {
        perror("read");
    }

    close(fd);
    return 0;
}Copy the code

Call open to get a number, use it for write and read operations, and call close. This is the basic Linux I/O flow.

You write JavaScript all the time, and people who aren’t familiar with C might have two problems.

  1. What is the number returned by the open method?
  2. The read operation reads the resource from the hard disk, and the code after the read operation has to wait.

With that in mind, let’s begin with the following.

File operator

We know that Linux has a slogan called “Everything is a file.” An important feature of this feature is the file descriptor mechanism.

Let’s summarize the common I/O operations, including:

  • TCP / UDP
  • Standard input output
  • File to read and write
  • DNS
  • Pipes (process communication)

Linux uses the file operator mechanism to implement interface consistency, including:

  1. The introduction ofFile operator(file descriptor, hereinafter referred to asfd);
  2. Unity ofreadwriteMethod to perform read and write operations.

As mentioned in the previous example, the open function returns a number, the file descriptor, that corresponds to a file unique to the current process.

Linux runs with a fixed-size array in the process control block (PCB), where each entry points to a record table of open files maintained by the kernel for each process.

struct task_struct { ... /* Open file information: */ struct files_struct *files; . } /* * Open file table structure */ struct files_struct { spinlock_t file_lock ____cacheline_aligned_in_smp; unsigned int next_fd; unsigned long close_on_exec_init[1]; unsigned long open_fds_init[1]; unsigned long full_fds_bits_init[1]; struct file __rcu * fd_array[NR_OPEN_DEFAULT]; };Copy the code

Fd is actually the sequence number of the corresponding file structure in this array, which is why FD starts at 0.

So if you pass fd in for a read or write operation, Linux will find the file you need to manipulate.

To detect the presence of an FD, you can use the lsof command, such as the following command to print the current fd in Chrome (if you are using Chrome to access the current page).

$ lsof -p$(ps -ef |grep Chrome|awk 'NR==1{print $2}')Copy the code

The second problem introduced by the above example:

The read operation reads the resource from the hard disk, and the code after the read operation has to wait.

In Linux, the main body of a program is a process or thread (prior to Linux kernel 2.4, the basic unit of scheduling became the thread, and the process became the container for the thread).

The process has an underlying state that looks like this when it’s running

Using the above example, after the read function is executed, the process enters the blocked state, and after the I/O ends, the process is re-unblocked by the system interrupt, enters the ready state, and waits for the time slice to be allocated, and enters the execution state.

This means that our process will block when I/O operations occur, which explains the problem above.

I/O model

The I/O mechanism described above is the blocking I/O mechanism in our I/O model.

Blocking I/O

Blocking I/O is the default execution mechanism of the Read /write function. When the read/write operation is performed, the process is blocked. After the I/O is complete, the system interrupts the process to put it into the ready state, waits for the time slice to be allocated, and then executes it.

One problem with the mechanism for blocking I/O, however, is that I/O operations cannot be executed concurrently, or CPU calculations can be performed simultaneously with I/O operations. If, in a Web request/response scenario, one request read state is blocked, the other requests cannot be processed.

We need to solve this problem.

Multithreading blocks I/O

The first idea is to use multithreading.

We pre-initialize a thread pool to block using the semaphore’s wait primitive. When an I/O operation is required, the thread is woken up by the semaphore Signal to perform the related I/O operation. See the code for details.

The downside of multithreaded non-blocking I/O, however, is that thread switching can be costly when the number of connections reaches a high level.

Therefore, it is expected that I/O waiting operations can be handled in one thread to avoid the overhead of thread context switching caused by opening multiple threads. Is there a way to do this, so you can introduce a non-blocking I/O mode.

Non-blocking I/O

Non-blocking I/O is a mechanism that allows the user to return immediately after calling an I/O read/write function, or -1 if the buffer is unreadable or unwritable. Here is an example of building a Web server with non-blocking I/O, looking at the code.

The key function is this:

fcntl(fd, F_SETFL, O_NONBLOCK);Copy the code

As you can see, the process STATE is no longer blocked, and I/O operations can be performed at the same time as other CPU operations

Non-blocking I/O can help us solve the need to execute I/O operations concurrently on a thread, but it can also cause problems:

  1. ifwhileCyclic polling for operations to be performed is unnecessaryCPUComputation waste, because at this pointI/OOperation not complete,readThe function doesn’t get the result;
  2. If you are usingsleep/usleepThe way the process is forced to sleep for a period of time is caused back againI/OThe operation did not return in time.

So, does the system have a mechanism that allows us to wait natively for multiple I/O operations to execute? The answer is yes, we need to introduce our I/O multiplexing.

I/O multiplexing

I/O multiplexing, hence the name, means the simultaneous execution of multiple I/O operations within a single process. There is an evolutionary process of its own, namely select, Poll, and epoll (the alternative on MacOS is Kqueue). Let’s take it one by one.

select

Select uses generally contains three functions (more on this, stamp this) :

void FD_SET(int fd, fd_set *fdset);

int select(int nfds, fd_set *restrict readfds,
fd_set *restrict writefds, fd_set *restrict errorfds,
struct timeval *restrict timeout);

int FD_ISSET(int fd, fd_set *fdset);Copy the code

Select allows you to listen for FDS in batches. When the buffer of any FD in the incoming Fd_set enters the readable/writable state, the block is unblocked and the specific FD is looped through FD_ISSET.

fd_set tmpset = *fds;

struct timeval tv;
tv.tv_sec = 5;

int r = select(fd_size, &tmpset, NULL, NULL, &tv);

if (r < 0) do {
        perror("select()");
        break;
    } while(0);
else if (r) {
    printf("fd_size %d\n", r);
    QUEUE * q;
    QUEUE_FOREACH(q, &loop->wq->queue) {
        watcher_t * watcher = QUEUE_DATA(q, watcher_t, queue);
        int fd = watcher->fd;
        if(FD_ISSET(fd, &tmpset)) { int n = watcher->work(watcher); }}}else {
    printf("%d No data within five seconds.\n", r);
}Copy the code

Here is an example of a SELECT build Web server that you can look at.

Although the SELECT function can solve the problem of I/O multiplexing, it also has some defects:

  1. The maximum number of FDS passed in by fd_set is 1024. If this number is exceeded, you may still need to use multiple threads.
  2. Performance overhead
    1. selectEvery time a function is executed, it existsfd_setCopy from use-state to kernel-state;
    2. The kernel needs pollingfd_setfdThe state;
    3. The return values in the user mode also need to be polled to determine which onesfdEnter the readable state;

poll

The first problem was solved by poll, which received an array of fd’s instead of a 1024 limit. The poll function is defined as follows:

int poll(struct pollfd *fds, nfds_t nfds, int timeout);Copy the code

The exact usage is very similar to select:

struct pollfd * fds = poll_init_fd_set(loop, count);

int r = poll(fds, count, 5000);

if (r < 0) do {
        perror("select()");
        break;
    } while(0);
else if (r) {
    QUEUE * q;

    QUEUE_FOREACH(q, &loop->wq->queue) {
        watcher_t * watcher = QUEUE_DATA(q, watcher_t, queue);
        int fd = watcher->fd;

        if (watcher->fd_idx == -1) {
            continue;
        }

        if((fds + watcher->fd_idx)->revents & POLLIN) { watcher->work(watcher); }}}else {
    printf("%d No data within five seconds.\n", r);
}Copy the code

Here is an example of poll building a Web server that you can look at.

However, the performance overhead mentioned above in SELECT remains a problem. On Epoll, the problem was solved.

epoll

Details can be found here. In simple terms, epoll divides the select, poll operation into three steps:

  1. epoll_create, create aepoll_fd, used for stage 3 monitoring;
  2. epoll_ctlBind the fd you want to listen to toepoll_fdAbove;
  3. epoll_waitAnd the incomingepoll_fd, the process enters the blocking state, listening on anyfdThe process is unblocked after the change.

Let’s look at how epoll addresses the performance overhead mentioned above:

  1. fdThe binding is inepoll_ctlPhase completion,epoll_waitI just pass inepoll_fd, there is no need to repeat the passfdThe collection;
  2. epoll_ctlThe incomingfdWill maintain a red-black tree in kernel state, when byI/OWhen the operation is complete, it is located through the red-black tree in O(LogN) modefd, avoid polling;
  3. Return to user modefdThe array is actually in a readable, writable statefdCollection, which no longer requires the user to poll all FDS.

In this case, epoll scheme is the best scheme for multiplexing. Here is an example of epoll building a Web server.

But isn’t epoll flawed? The answer is no:

  1. epollCurrently only supportedpipe, network and other operationsfdFile system generation is not supportedfd.

Asynchronous I/O

Both blocking and non-blocking I/O and I/O multiplexing described above are synchronous I/ OS. Both require the user to wait for the I/O operation to complete and receive the returned content. The operating system itself also provides asynchronous I/O solutions that correspond to different operating systems:

  1. Linux
    1. Aio, currently being criticized, is only supportedDirect I/O(File operation)
    2. Io_uring, a new addition to the Linux Kernel in version 5.1, is considered Linux asynchronousI/OThe new home
  2. windows
    1. Iocp, as libuv’s asynchronous processing scheme on Windows. (the author ofwindowsThere is not enough research to do more introduction.)

So far, several common I/O models have been introduced.

At present, epoll mechanism is still recommended on Linux. Epoll does not support listening on file FDS, so we need to work with libuv to solve this problem.

Libuv’s solution

Libuv uses epoll to build the body of the Event-loop, where:

  1. socket.pipeSuch as throughepollMode monitoredfdType, byepoll_waitTo monitor;
  2. File processing/DNS resolution/decompression, compression and other operations, using the worker thread for processing, the request and result through two queues, by onepipeCommunicate with the main thread,epollListen to thefdTo determine when to read the queue.

That’s the end of this article. A quick summary:

First, we introduced the concept of file descriptors, followed by switching between basic Linux process states. Then the concept of blocking I/O was introduced, as well as defects, which led to the introduction of multithreaded blocking I/O, non-blocking I/O, and I/O multiplexing and asynchronous I/O. Finally, combined with the above knowledge, a brief introduction of liBUV internal I/O operation mechanism.

Finally, make a “little commercial.” Bytedance is looking for the best front-end engineers and Node.js engineers to join us and do something fun. Please send them to [email protected].

Well, I’ll see you next time.