All the code in this paper will be presented in the form of pseudo code or pictures, please rest assured to read (C reading is still a bit of trouble)

This article is written based on REdis5.0

Prior to the start

Say one thing, before seeing Redis source code, my C language level can only write college students set and brush OJ topics. Before I started reading the code, I was worried that I would give up halfway, but I couldn’t give up the goal I had made at the beginning of the year (I had already failed one, and I couldn’t achieve it every week), so I stuck to it. Fortunately, I have learned something to share with you here.

I’ve been a timid and shy boy my whole life, so do me a favor and give it a thumbs up

The essence of programming thinking

If I ask you what is the essence of programming thinking? What idea is the foundation of today’s rich software ecosystem? (Please share your opinions in the comments section.)

As far as I’m concerned, it’s made up of two ideas: abstraction and encapsulation. For example, both the JVM virtual machine of Java system and the V8 engine of Node.js are further abstract encapsulation of the hardware resources of the operating system in essence, and provide a unified API interface, so that the applications developed on the engine can run on different platforms.

The question is, what does this have to do with Redis? Isn’t Redis developed in C? Isn’t C process-oriented? How to abstract how to encapsulate?

It is quite normal to have a point of view that most people’s experience is the same as mine. At most, they use C language to write backgammon and brush algorithm problems, and have little access to production-level code. Recall that the only thing that has anything to do with object orientation is structures, and yes, that’s pretty close.

Why talk about abstraction and encapsulation? Just as there are no useless characters in mystery movies (Amway, Manslaughter). The goal of Redis is to run on a variety of types of operating systems, now operating system manufacturers are mostly independent, as far as the earth is concerned, there has not been a unified world operating system, maybe some friends have already understood what I want to ask, that is, how does Redis achieve a set of code everywhere compilation? Although not the focus of this article, but using C language to achieve similar object-oriented functions or to my young heart caused a shock, as for how to achieve the following will be mentioned, let’s return to the focus

Redis/IO model

Event driven and I/O multiplexing

Before starting, it’s worth asking yourself if you really understand event-driven mechanics.

I don’t know if you have the experience of developing Windows applications with C++ in Windows system, a win32 program usually takes events from the user in a while loop, such as the reader reading this article, whether it is in the professional habit of pressing F12, Or sliding a mouse wheel will generate an event. Usually, the operating system will provide API functions so that the program can retrieve the event generated by the user’s action.

The pseudo-code is as follows. If you want to see the actual code, you can click your-first-Windows-program

while(true) {// Get the event
    let msg = getMessage(); 
    // Translate the message
    translateMessage(msg);
    // Distribute the message
    dispatchMessage(msg);
}
Copy the code

As shown in the code above, you can see the following characteristics of event-driven programs

  • There is usually a container for the generated events
  • A loop of events
  • There is a way to get events
  • Once the event is retrieved, it needs to be processed (ignoring it can also be considered a processing method)

With event-driven mechanisms behind us, let’s move on to I/O multiplexing, which is hot!

For comparison, let’s review the classical C/S structure program. The code looks like this:

while(true){
    Socket client = server.accept();
    ClientThread thread = new ClientThread(client);
    thread.start();
}
Copy the code

As an example, Sun Wukong fights monsters. Every time he meets a monster, he creates a doppelganger to play with the monster. Sun Wukong himself is responsible for constantly plucking hair to create doppelganger and protect Tang Priest.

So, the problem came, if sun Wukong is a programmer, how should Tang’s monk do?

After all, Tang’s monk is the leading role, hung up how to play, so we can casually to a big fairy to a “town demon list” magic weapon, the magic weapon will be all the small monsters into the void, big brother every time can get interested in the magic weapon (you say monkeys are interested in what monster?) Line up with it.

So that’s the I/O multiplexing model. Development operating system leaders have long provided us with apis to capture the events we are interested in, combined with an event-driven model of I/O multiplexing.

A more rigorous and academic description is as follows:

I/O multiplexing. In this form of concurrent programming, applications explicitly schedule their own logical flows within the context of a process. A logical stream models the state machine. After data arrives at the file descriptor, the main program explicitly transitions from one state to another. Because the program is a separate process, all streams share an address space. (CSAPP)

In other words we can describe an I/O multiplexer program with a state graph.

Question again? Does I/O multiplexing have to be single-threaded?

Affirmative and positive answers: No

I/O model

It’s not hard to see that Redis uses the Reactor design pattern. In other words, we use the API provided by the operating system so that we don’t need to create a new thread for each client. That means that Redis uses the single-threaded DESIGN pattern for the Reactor design pattern, but what is the I/O thread?

I/O threads are threads that read data from the client and output response data to the client.

Why and when do I/O threads start?

First need to clear is redis although polling way to get the data can be used, but read the client data and to the client output data when the called function will still produce blocks (blocking time general super short, short to you unable to detect), but there should always be a, however, assume that you are in a company is very poor, There is only one Redis server (and a lot of data), and one day a casual worker stuffed a 512MB set studyResouces into Redis. If the single-threaded model continues, it is not hard to imagine that the entire Redis service will be blocked for a short time. So if we have multiple I/O threads, the core business thread can outsource the input and output to the I/O thread. As for when to start the I/O thread, let’s talk about it below.

So a reasonable definition is as follows

Redis does have only one thread responsible for manipulating data, but there is not only one thread responsible for I/O, and Redis also opens the thread for serialization.

Again, why does Redis have only one thread responsible for manipulating data?

  • Data for all redis operations is in-memory data

  • Redis data operations can generally be completed in constant time

  • Single-threaded data operations can avoid data security issues associated with multi-threaded operations (without locking)

Main process

As you can see, the redis core business thread loops through the beforeSleep and processEvents methods.

aeProcessEvents

First, take a look at aeProcessEvents, whose code is shown below.

Because Redis has scheduled tasks to be executed, if the polling event is blocked for a long time (redis calls it sleep), the scheduled task cannot be executed for a long time. Therefore, it is necessary to calculate the maximum waiting time at the site.

AeApiPoll () causes the thread to block until an I/O event occurs, passing in the maximum blocking time, after which it will return immediately even if there are no I/O events

AfterSleep does not immediately process the I/O event after polling, but instead executes the afterSleep hook function.

Then it’s time to handle the events that aeApiPoll polls.

If you read the code, you’ll notice the odd variable invert, which is related to the configuration parameter AE_BARRIER and determines the order in which the read and write functions are executed.

Read and write event handlers connected to redis clients (such as Redis-CLI) point to connSocketEventHandler in Connection.c. ConnSocketEventHandler, which determines the order in which to call read and write event invocations depending on the situation. (Both the invert parameter and the poll-to-event type are passed to this function.)

Observing the variable fired we conclude that Redis does not call both read and write event handlers in a loop. And if

  • AE_BARRIER = 1 (i.einvenrt = trueRedis handles write events first and then read events
  • AE_BARRIER = 0 (i.einvenrt = falseRedis handles read events first and then write events

What is the use of AE_BARRIER?

To figure this out, let’s figure out a question: what is falling?

Suppose you are in the kindergarten entrance exam, you met the calculation question 1+1=, you have come up with the answer is 2, but due to the lack of time, you did not write down, and someone deducted 10 points from the dream kindergarten.

So, it’s one thing if you’ve figured it out, but it’s another thing if you’ve written the answers and painted the answer sheet.

Analogy to the operating system, will also have this case, what do you think you call the write function is saved to the hard disk, the data is actually data will stay in memory for a while, waiting for a suitable time to save the data to the hard disk, assuming that the data during his stay in memory suddenly loses power, the data is not can’t?

To avoid this, the Operating system (Linux) provides a function called fsync to ensure that data is written to disk, that is, to ensure that data falls off disk. Calling this function blocks until data is written to disk successfully.

Invert =true if redis has appednfsync=always and AOF enabled, and certain conditions are met.

Under what conditions?

First of all, let’s be clear that the usual place to output data is not in the write processor, but in beforeSleep in response to the data output to the client. Let’s look at the call stack validation for output data.

The original screenshot is shown below

In addition, in general, after receiving a connection from a client, Redis registers only read events of interest on that connection, and only write events of interest when a write handler is installed.

Now, do you have a lot of question marks? Me too. The question is, since beforeSleep already outputs the data, why reverse the read and write order, write first and then read?

Rule out all possibilities, and whatever remains, however irrational, seems to be the truth.

There is only one possibility -> the data is not finished. 😂

Look at the following code, at line 1373 in networking.c

If appednfsync=always is turned on and the client is still waiting for output, a write handler will be installed for the client and invert will be set to true. In this case, the following events occur

  • 1. Read the commands from the client and process them (aeProcessEvents)
  • 2. Execute the AOF operation (beforeSleep)
  • 3. Sends response data to the clientappednfsync=always, enable AE_BARRIER(i.e. Invert =true) and install write handler (beforeSleep)
  • 4. Call write handler to output data (aeProcessEvents)
  • 5. Remove write handler (aeProcessEvents)

Generally speaking, the Redis client blocks after issuing an instruction and waits for a response from the server. During this time, the client does not issue any other data manipulation instructions (only for RESP2 and below, clients using RESP3 can do this).

Remove the handler code in writeToClient, which I’ll talk about later

The following point is necessary to avoid misunderstanding. Previously mentioned processReadEvent and processWriteEvent both point to connSocketEventHandler. But connSetWriteHandlerWithBarrier Settings here write processor sendReplyToClient point will not processWriteEvent sendReplyToClient, Instead, register the write handler called in connSocketEventHandler. It might be a little more intuitive to look at the code.

The code is located in connection.c

What were you doing before you went to bed?

In the previous aeMain code, beforeSleep is called every time we enter the event loop, which tells us what The Redis does before going to sleep.

In general, in order, beforeSleep does the following:

  • Processing withSecure Transport Layer Protocol (TLS)Data to be processed in the client
  • If the cluster function is enabled, theclusterBeforeSleep
  • Perform a quick scan to remove expired keys from the database
  • Handle cluster related tasks
  • Handle clients that are blocked by executing blocking commands (e.g., subscribe clients)
  • Perform the AOF operation
  • Check whether you need to start the I/O thread and output data to the client (handleClientsWithPendingWritesUsingThreads)
  • Asynchronously close the client that you want to close

BeforeSleep function to do a lot of things, but we care about the I/O model, we are only concerned with the flow of data, so focus on the handleClientsWithPendingWritesUsingThreads

Simplify the handleClientsWithPendingWritesUsingThreads code as shown below

It’s not hard to see that the main thread assigns tasks to I/O threads mainly through task queues and arrays of flag bits. In addition, the ioThreadOp indicates the type of the current task to the thread: IO_THREADS_OP_WRITE Performs a write task or IO_THREADS_OP_READ performs a read task.

So what is the task of enabling multithreaded I/O? Look at the stopThreaedIOIfNeed function.

If the number of tasks to be processed >= number of I/O threads *2, redis will enable multithreaded IO

Otherwise, the I/O thread is stopped and blocked

Based on the above code, it is not difficult to derive the following structure

Again, how does the main thread control the state of the I/O thread? This one we need to add a little bit of multithreading knowledge, let’s talk about it later, first to see what Redis did after waking up.

What does afterSleep do after waking up?

When Redis woke up (and returned from the aeApiPoll) he did one thing, Call handleClientsWithPendingReadsUsingThreads this function is similar to the handleClientsWithPendingWritesUsingThreads described above only ioThreadOp IO_THRE ADS_OP_READ means that I/O threads only handle read events.

ProcessTimeEvents Scheduled task

ProcessTimeEvents loops through the queue of scheduled tasks and executes them if the time is up. These tasks are usually not too complex, so we’ll focus on what are the scheduled tasks.

Registering a scheduled task You can use aeCreateTimeEvent to register a scheduled task with an event loop

After locating, you will find that only one scheduled task serverCron is registered (this function is located in server.c).

The code to register the scheduled task in the event loop is shown below, which shows that serverCron is set to fire every 1 millisecond.

    if (aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR) {
        serverPanic("Can't create event loop timers.");
        exit(1);
    }
Copy the code

In addition, we can control the behavior of serverCron with two parameters.

  • server.hzControls the frequency at which the serverCron function is executed. The default is 10(10 times in 1 second) and the maximum is 500
  • server.cronloopsNumber of times the serverCron function was executed

For example, redis has a periodic database usage statistics function with a period of 5000, so how does Redis determine when to call it?

Observe the following function

function shouldRunWithPeriod(ms){
    return ms <= 1000/server.hz || ! (server.cronloops % (ms/(1000/server.hz)));
}
Copy the code

When Server. cronloops = 50 * n(n = a multiple of 50), the loop will perform statistics. Remember that server. Hz can control the frequency of serverCron execution, and server. Hz = 10 that is, serverCron execution every 100 milliseconds. 50 * 100 == 5000 for Server. cronloops = 50

The following table lists the scheduled tasks. (server.hz = 10)

Frequency of execution refers to how many times the task is executed when server.cron is called, and * means it is executed every time

Task type Execution frequency note
Statistics memory usage *
Handles exit signals from the operating system * Unscheduled task
Statistics on the use of all database dictionaries 50 * n (n = 1, 2,…). Each database is actually a dictionary
Prints client and slave node information 50 * n (n = 1, 2,…). Redis is required to run in sentry mode
Handling connection timeout clients and client buffers (clientCron) * Unscheduled task
Database dictionary incremental capacity expansion or shrinkage * Unscheduled task
AOF related processing * Unscheduled task
This function is enabled if the last AOF write failedfsyncMechanism writes to AOF file 10 * n (n = 1, 2,…).
Execute a scheduled task for primary/secondary replication * You need to enable the primary/secondary replication mode
The sentinel’s regular duty * Beyond the scope of this article
Stop the I/O thread if there are too few PENDING I/O tasks * Unscheduled tasks, all the jobs are not available, what do you say
Perform BGSAVE(serialized database, different from AOF) You need to execute the command periodically based on the value in the configuration file
server.cronloops++

I/O thread implementation

The main thread of Redis controls the I/O thread behavior by assigning each thread a task queue, a thread status flag, and a shared task type. How does redis control the thread into a blocked state to avoid idling and consuming system resources?

So without further ado let’s take a look at the pseudocode.

As you can see, the code for the I/O thread isn’t complicated, but some of it is a little confusing.

For example, we can see that a thread will execute a 1000000 empty load loop just to see if the thread flag bit is not zero.

Why is it designed this way? Concurrent programming experience students it is easy to see, this kind of behavior is actually spin, while spin will consume a certain amount of resources (but not too much), if the thread assigned to the task during the spin, that wouldn’t have to enter the blocking state, to recover from the blocking state, and spin is less than the cost of thread into the blocking state to the cost of recovery from the blocking state.

If we continue reading the code, we can see that once we get the mutex, we immediately release it. Why is that? This gives the main thread a chance to lock, because the main thread blocks by locking. For example

time I/O thread The main thread
t1 Locking A Running state
t2 Try to lock A and enter the blocking state
t3 The lock is released A blocking
t4 Go through the next cycle Locking A
t5 In the spin Running state
t6 Try to lock A and enter the blocking state Running state
t7 blocking Running state
t8 blocking Running state

What happens when you output the response data

After reading the above, it is easy to conclude that Instead of exporting all the response data at once, Redis may choose to export some of the data and move on to other things. The reason for doing this is that there is only one core thread in Redis, so we can’t make other clients wait too long. If a temporary worker executes keys * on the terminal, then we don’t need to play.

To be more specific, let’s take a look at the author’s comments and code in writeToClient to see when not all data is printed at once.

  • totwrittenRefers to the amount of data currently output,NEXT_MAX_WRITES_PER_EVNETThe value of64KBnamely64 * 1024
  • server.maxmemorySpecifies the maximum amount of memory that redis can use. The default value is0That is, a 64-bit system has no memory limit, and a 32-bit system uses a maximum of 3GB memory
  • zmalloc_used_moemoryCan get currently allocated memory

Maxmemory =0 by default redis sends all responses to the client in a timely manner to avoid memory usage. In addition, if the conditions are met, the response data exceeding NEXT_MAX_WRITES_PER_EVNET size will not be output at one time, and the measured data will be shown in the following.

As a general rule, if memory is limited or no maximum memory is configured, Redis will output the response data to the client as quickly as possible. If memory is sufficient, Redis will output some data first, and the rest will be output in the next event loop.

In addition, writeToClient will clean up and tune the write processor originally installed on the Redis client after confirming that the user data has been output.

Redis also designs two types of buffer for transient response data, as shown below

  • ReplyBuffer A buffer of response data, of type an array of bytes, used to hold response data temporarily
  • ReplyList Indicates the response data queue of typeclientReplyBlockThe list

So what are the allocation rules? Let’s look at the implementation of the addReply function

Looking at the code above, the following conclusions can be drawn.

  • The response data is first tried to add to the buffer (buffer size is16 *1024 = 16KB), if the response buffer is full, it is added to the response queue
  • The response data will be executedbeforSleeporIO threadIn the output

Event loop abstraction

AeEventLoop is an implementation of redis event loop. AeApi is an abstraction of THE I/O multiplexing API interface of operating systems, and provides different implementations for different operating systems.

  • aeMainIs the main function of the event loop, which is called after the Redis server is started
  • Can be achieved byaeCreateFileEventAs well asaeDeleteFileEventAdd or remove I/O events of interest in this event loop (callAeApi.aeApiAddEventandAeApi.aeApiDelEvent)
  • Can be achieved byaeCreateTimeEventAs well asaeDeleteTimeEventAdd or delete a scheduled task
  • setBeforeSleepHookCan be set on the entry event polling (i.eAeApi.aeApiPoll) before (see abovebeforeSleep)
  • setAfterSleepHookYou can set the function to be called after the event polling is complete (see above)afterSleep)
  • setDontWaitEnables event polling to be performed without entering the wait state and immediately return the currently processable event, or immediately return if no event can be processed.

As shown in the figure above, in order to adapt to different operating system ecosystems, Redis designs a unified event polling API interface, AeApi, and provides different implementations. This API mainly provides functions of registering interested I/O events, deleting interested I/O events and polling events.

The following table describes the differences between AEapis.

The name of the The underlying implementation performance The operating system describe
AeEpoll epoll high Linux The number of descriptors monitored (the number of clients) is not limited, and the efficiency of IO does not decrease as the number of FDS monitored increases
AeApiEvport evports I don’t know. I can’t reach a conclusion Solaris(sun distributed system, I have not seen 😅) The implementation is more complex, so epoll works
AeApiKqueue kqueue I don’t know. I can’t reach a conclusion FreeBSD, Unix Similar to the epoll
AeSelect select The worst It is implemented on different operating systems as a guarantee The maximum number of file descriptors (clients) that can be processed is 1024

The resources

Not willing to run a run?

Looking at the code alone, it’s always a bit dry, so let’s do some temporary work and see how Redis works in different environments.

The operating environment is as follows:

  • centos7
  • 1 CPU
  • 2G RAM
  • server.maxmemory = 10485760The 10 m

Casual worker SAO operation

Suppose that in a poor company, a temp worker named Ke accidentally performs keys * on an online database. What happens?

Before the test starts, let’s hit two breakpoints, namely addReply and writeToClient

Open a redis-cli run keys * command

Observe the invocation of addReply

It can be seen that the response data was not added to the buffer but to the response queue due to the large size of the data, and multiple addReply calls were executed due to the full table scan command, as shown in the figure below.

The output client is the same, but the response data is different

A second look shows that writeToClient does indeed pull response data from the response queue

Next, let’s look at writeToClient’s response, with the call stack shown below

writeClient

For handleClientsWithPendingWrites main validation we write processor will be installed.

As you can see,redis did install a write handler for our client because the data was not output cleanly. Next we release the program and we will meet again in writeToClient, and this time the method of calling writeToClient will be aeProcessEvents, which outputs data in an event loop rather than in beforeSleep. The call stack is shown in the following figure.

Inspired by the

  • I don’t know when there will be a multithreaded version of JavaScript(this article assumes I am using the multithreaded version of JS).
  • Don’t performkeys *Full table scan operation, if you have not configured I/O threads or maximum memory usage
  • All parameters of this configuration have been configured (although operation and maintenance can be configured basically, it is necessary to know).
  • Exporting articles or teaching others can really help clear your mind
  • The only thing to keep in mind during the interview is that Redis is not single-threaded. Rather, there is only one core business thread in Redis, but multiple I/O threads can be configured. In addition, threads can be opened for RDB serialization
  • Why js as pseudocode? Diving for gold for many years, found or the front end of the enthusiasm is relatively high ha 😜
  • So give me a thumbs up!! Next time update redis debugging experience, the same with everyone can understand the language, by the way review C language bai, after all, stepped on a lot of pits.

Finally, a graph is used to describe the memory areas and functions that a Redis command passes through.