The purpose of this article is to introduce the introduction of epoll network programming, and the redis-server epoll related source code analysis

Network programming template

Common network programming modes are as follows (take TCP programming in ipv4 as an example).

  1. First, create a socket with the file descriptor listen_fd.

  2. Bind it to a specific IP and port number,

  3. Turn on the listening,

  4. Use a loop to accept the client’s request,

  5. Create child processes or threads to handle connected requests

    Listen_fd = socket(); // create a listener file descriptor, listen_fd = socket(); While (1) {// the main process is used to receive connections new_client_fd = accept() // Create a child process or thread to process new client requests}

Epoll programming template

The problem with this mode is that there are system calls to create child processes and threads, and each new TCP connection needs to be allocated a process or thread. If the C10K is reached, it means that a machine has to maintain 10,000 processes/threads, and there are certain performance problems in dealing with high concurrency scenarios. Is it possible for one process/thread to maintain multiple sockets? I/O multiplexing, of course.

Any time a process while can only handle a request, but the event processing each request, take control in less than 1 millisecond, so 1 seconds can handle thousands of requests, the longer term, multiple requests to reuse, a process that is multiplex, this idea is very similar to a CPU concurrent multiple processes, So it’s also called time division multiplexing. The familiar select/poll/epoll kernel provides multiplex system calls in user mode, and a process can retrieve multiple events from the kernel through a single system call function.

There are many articles comparing select/poll/epoll, but I won’t go into them here. Since epoll has a significant performance advantage over select and poll, let’s look directly at epoll programming. There are only three epoll-related functions:

Int epoll_ctl(int __epfd, int __op, int __fd, int __fd) Int epoll_wait(int __epfd, struct epoll_event *__events, int __maxEvents, int __timeout)Copy the code
  1. Epoll_create is the descriptor epoll_fd that creates an epoll
  2. The epoll_ctl function uses epoll_fd ((int __epfd)) and socket_fd (int __fd), Add EPOLL_CTL_ADD (int __op) or delete EPOLL_CTL_DEL (int __op) to the epoll reactor. The last parameter struct epoll_event *__event is a structure, There are two parameters need to be set inside: (1) set the trigger mode. Ev events = EPOLLIN | EPOLLET; (2) Set the socket’s corresponding fd: ev.data.fd = listen_fd;
  3. The first argument is epoll_fd, the second argument is to receive an array of events that are triggered, and the subsequent processing is to iterate over the array. The third argument is the maximum number of events that can be processed. The fourth argument is the wait time. A value greater than 0 is the waiting time.

Look at epoll’s programming model:

Listen_fd = socket(); // create a listener file descriptor, listen_fd = socket(); IP and port) // Listen (listen_fd) // Create epoll handle. Epoll_fd = epoll_CREATE (MAXEPOLLSIZE); [begin] ****/ // create ev variable and use struct epoll_event ev in epoll_ctl function. / / set the trigger mode ev. Events = EPOLLIN | EPOLLET; // Set the fd variable ev.data.fd = listen_fd; Listen_ctl (epoll_fd, EPOLL_CTL_ADD, listen_fd, EPOLL_CTL_ADD) Struct epoll_event fired_events[MAXEPOLLSIZE] struct epoll_event fired_events[MAXEPOLLSIZE]; While (1) {// wait for an event to occur. Fired_events stores the events that have been triggered. -1 indicates no timeout. Epoll_event_nums = epoll_WAIT (epoll_fd, fired_events, curfds, -1); for(j = 0; j < epoll_event_nums; J ++) {if(fired_events[j].data.fd == listen_fd) {// If the event descriptor is listen_fd) //1. New_client_fd = ACCEPT (listen_fd, xx, xx) //2. Connect the new client fd added to the collection of epoll ev. Events = EPOLLIN | EPOLLET; ev.data.fd = new_client_fd; Epoll_ctl (epoll_fd, EPOLL_CTL_ADD, new_client_fd, &ev)} else {// If the event is triggered by a connected client, Recv (fired_events[j].data.fd, buf, xx, xx) send()}}}Copy the code

Full example: The Epoll Getting Started example

Epoll in redis-server

Now that you know the basics of epoll, let’s take a look at how redis programs based on epoll. The Redis source code used in this article is version 5.0.0.

1. The encapsulation

Redis uses different I/O multiplexing libraries for different systems, but only Linux uses the Epoll library. You can see the following code in the ae.c file:

#ifdef HAVE_EVPORT
#include "ae_evport.c"
#else
    #ifdef HAVE_EPOLL
    #include "ae_epoll.c"
    #else
        #ifdef HAVE_KQUEUE
        #include "ae_kqueue.c"
        #else
        #include "ae_select.c"
        #endif
    #endif
#endif
Copy the code

Through encapsulation, four functions are formed:

int aeApiCreate(aeEventLoop *eventLoop);
int aeApiAddEvent(aeEventLoop *eventLoop, int fd, int mask);
void aeApiDelEvent(aeEventLoop *eventLoop, int fd, int delmask);
int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp);
Copy the code

AeApiCreate corresponds to epoll_create, and aeApiAddEvent corresponds to epoll_ctl(epoll_fd, EPOLL_CTL_ADD,…). , aeApiDelEvent corresponds to epoll_ctl(epoll_fd, EPOLL_CTL_DEL,…). AeApiPoll corresponds to epoll_wait. As you can see, the first argument to each function is a variable of type aeEventLoop, which is actually a global variable stored in server.el (server is a global variable created when Redis-server is started). Let’s take a look at something useful in aeEventLoop:

typedef struct aeEventLoop {
    int setsize; /* max number of file descriptors tracked */
    aeFileEvent *events; /* Registered events */
    aeFiredEvent *fired; /* Fired events */
    // ………………
} aeEventLoop;
Copy the code

There’s an aeFiredEvent * Fired variable that holds all the events that are fired in epoll_WAIT, and that corresponds to the fired_events array we talked about in section 2, which loops through all the network events. So what’s in aeFileEvent *events? We’ll talk about that in the summary below.

2. Bind events

As you can see in section 2, there are only two events, read and write, when accepting a new connection or handling an existing one, so design a structure that binds different read and write functions for each network event. This is what redis does:

In either event, the event is triggered when we can get the corresponding fd network events, from which we can think of a data structure, design a map, the key is fd, the value is a complex structure, including read event handler, write the event handler, redis also does have a structure like this:

/* File event structure */
typedef struct aeFileEvent {
    int mask; /* one of AE_(READABLE|WRITABLE|BARRIER) */
    aeFileProc *rfileProc;
    aeFileProc *wfileProc;
    void *clientData;
} aeFileEvent;
Copy the code

RfileProc corresponds to the read event handler, wfileProc corresponds to the write event handler, and which function is called in this network event is controlled by the mask. Note that instead of using a real map, redis uses an array directly. In Linux, fd is an integer and there is a “maximum file handle” configuration item. Redis creates an array with the size of “maximum file handle” and uses the subscript of the fd value as the key. This data is the events variable in the aeEventLoop structure described in Section 3-1. The whole process is as follows: When iterating through the AeEventLoop. fired array, aeEventLoop.fired. Fd is used to get the fd value, and aeEventLoop.events[fd] is used to get the aeFileEvent structure. Determine whether to call the rfileProc function or the wfileProc function based on the aeEventLoop.fired. Mask value.

So when is the event binding? First look at the timing of the creation of the Events variable (server.el) in aeEventLoop:

// Line 2040 in the server.c file, initServer function server.el = aeCreateEventLoop(server.maxClients +CONFIG_FDSET_INCR);Copy the code

AeCreateEventLoop function:

aeEventLoop *aeCreateEventLoop(int setsize) { aeEventLoop *eventLoop; int i; if ((eventLoop = zmalloc(sizeof(*eventLoop))) == NULL) goto err; EventLoop ->events = zmalloc(sizeof(aeFileEvent)*setsize); eventLoop->events = zmalloc(sizeof(aeFileEvent)*setsize); eventLoop->fired = zmalloc(sizeof(aeFiredEvent)*setsize); ... }Copy the code

For handling new connection events, redis binds the initialization function initServer(void) (line 2129 of the server.c file). You can see that the code binds the AE_READABLE event handling function to acceptTcpHandler:

for (j = 0; j < server.ipfd_count; j++) { if (aeCreateFileEvent(server.el, server.ipfd[j], AE_READABLE, acceptTcpHandler,NULL) == AE_ERR) { serverPanic( "Unrecoverable error creating server.ipfd file event."); }}Copy the code

Why is the outer layer a cycle? Since a machine may have multiple IP addresses (multiple network cards), Redis creates a listen_fd for each IP address, which is bound to a new client after receiving a connection:

client *createClient(int fd) { client *c = zmalloc(sizeof(client)); /* passing -1 as fd it is possible to create a non connected client. * This is useful since all the commands needs to be  executed * in the context of a client. When commands are executed in other * contexts (for instance a Lua script) we need a non connected client. */ if (fd ! = -1) { anetNonBlock(NULL,fd); anetEnableTcpNoDelay(NULL,fd); if (server.tcpkeepalive) anetKeepAlive(NULL,fd,server.tcpkeepalive); if (aeCreateFileEvent(server.el,fd,AE_READABLE, readQueryFromClient, c) == AE_ERR) { close(fd); zfree(c); return NULL; }} / /........................... }Copy the code

3. Code debugging

Enable debugging in a window:

gdb redis-server
(gdb) b acceptTcpHandler
(gdb) b readQueryFromClient
(gdb) r
Copy the code

Connect in another window:

redis-cli
Copy the code

The first breakpoint is triggered:

Breakpoint 1, acceptTcpHandler (el=0x7ffff6c2b0a0, fd=11, privdata=0x0, mask=1) at networking.c:727
727	    int cport, cfd, max = MAX_ACCEPTS_PER_CALL;
Copy the code

Look at the call stack:

(gdb) bt
#0  acceptTcpHandler (el=0x7ffff6c2b0a0, fd=11, privdata=0x0, mask=1) at networking.c:727
#1  0x000000000042a4fe in aeProcessEvents (eventLoop=0x7ffff6c2b0a0, flags=11) at ae.c:443
#2  0x000000000042a6e1 in aeMain (eventLoop=0x7ffff6c2b0a0) at ae.c:501
#3  0x0000000000437239 in main (argc=1, argv=0x7fffffffe438) at server.c:4194
Copy the code

El = 0x7FFff6c2b0a0, fd=11, el. Events [11] should correspond to an aeFileEvent structure where the rfileProc variable should refer to the acceptTcpHandler function. Print it out and see:

(gdb) p el.events[11]
$2 = {mask = 1, rfileProc = 0x441b51 <acceptTcpHandler>, wfileProc = 0x0, clientData = 0x0}
Copy the code

It’s the same as our analysis above. ReadQueryFromClient = readQueryFromClient;

(gdb) c
Continuing.

Breakpoint 2, readQueryFromClient (el=0x7ffff6c2b0a0, fd=12, privdata=0x7ffff6d0d740, mask=1) at networking.c:1501
1501	    client *c = (client*) privdata;
Copy the code

Look at the call stack:

(gdb) bt
#0  readQueryFromClient (el=0x7ffff6c2b0a0, fd=12, privdata=0x7ffff6d0d740, mask=1) at networking.c:1501
#1  0x000000000042a4fe in aeProcessEvents (eventLoop=0x7ffff6c2b0a0, flags=11) at ae.c:443
#2  0x000000000042a6e1 in aeMain (eventLoop=0x7ffff6c2b0a0) at ae.c:501
#3  0x0000000000437239 in main (argc=1, argv=0x7fffffffe438) at server.c:4194
Copy the code

= readQueryFromClient = readQueryFromClient = readQueryFromClient = readQueryFromClient = readQueryFromClient

(gdb) p el.events[12]
$4 = {mask = 1, rfileProc = 0x443987 <readQueryFromClient>, wfileProc = 0x0, clientData = 0x7ffff6d0d740}
Copy the code

The result was exactly what we expected. Let’s go back to the stack code. Main calls aeMain. Let’s look at what’s happening in aeMain:

void aeMain(aeEventLoop *eventLoop) { eventLoop->stop = 0; while (! eventLoop->stop) { if (eventLoop->beforesleep ! = NULL) eventLoop->beforesleep(eventLoop); aeProcessEvents(eventLoop, AE_ALL_EVENTS|AE_CALL_AFTER_SLEEP); }}Copy the code

As you can see, the outer layer is a while loop (corresponding to our outermost loop in section 2), and the body of the loop is the aeProcessEvents function. Let’s look at aeProcessEvents again:

int aeProcessEvents(aeEventLoop *eventLoop, int flags) { int processed = 0, numevents; // Handle the time event, skip... Numevents = aeApiPoll(eventLoop, TVP); For (j = 0; j < numevents; j++) { aeFileEvent *fe = &eventLoop->events[eventLoop->fired[j].fd]; int mask = eventLoop->fired[j].mask; int fd = eventLoop->fired[j].fd; int fired = 0; /* Number of events fired for current fd. */ int invert = fe->mask & AE_BARRIER; // Call rfileProc function if (! invert && fe->mask & mask & AE_READABLE) { fe->rfileProc(eventLoop,fd,fe->clientData,mask); fired++; If (fe->mask & mask & AE_WRITABLE) {if (! fired || fe->wfileProc ! = fe->rfileProc) { fe->wfileProc(eventLoop,fd,fe->clientData,mask); fired++; }} // Omit other code... } // Omit other code... }Copy the code

As you can see, the epoll_WAIT wrapper function aeApiPoll is called first, and then the for loop processes the event that has already been set off (corresponding to the for loop in section 3), using the mask value to determine whether it is a read event or a write event. If it is a read event, the rfileProc function is called, and if it is a write event, Call the wfileProc function. The above analysis has put the whole network event processing in Redis string together, but also left a write event processing, sendReplyToClient function, readers can debug, see what it is used in the scene.

Four,

This paper explains the basic mode of epoll programming through a basic framework, and pulls out the related network module source code in Redis-server to explain. The purpose of this article is to help you understand Epoll programming in a nutshell, so many of the details that are dealt with in Redis are not covered, and we will focus on the details once we understand the basic framework.

Reference article:

  1. Promise me this time, I/O multiplexing!
  2. “Original 8 pictures, you can learn waste Reactor and Proactor”
  3. Redis5 Design and Source Code Analysis