This is the 15th day of my participation in the August More text Challenge. For details, see: August More Text Challenge

“Why is single-threaded Redis so fast?”

Haha, anyway, I often ask candidates this question in the interview. In fact, this question is an investigation of the internal mechanism of Redis, which can involve a lot of questions related to the underlying and deep principles.

Going back to the question itself, there are two basic answers:

  • Based entirely on memory

  • IO multiplexing

1. The first point is easy to understand. The vast majority of Redis requests are purely memory operations, which are very fast. The data is stored in memory, similar to a HashMap, and the time complexity to find and manipulate is O(1).

2. What is IO multiplexing? What is IO multiplexing?

In this paper, I/O concurrent performance to improve the overall thinking, to gradually analyze the PRINCIPLE of I/O multiplexing.

How to quickly understand IO multiplexing?

  • Multiple processes

  • multithreading

  • IO multiplexing based on single process (select/poll/epoll)

1. Multiple processes

In the case of concurrency, if one process can’t handle concurrency, can’t multiple processes handle multiple client connections simultaneously?

The multi-process approach does solve the problem of the server being able to handle multiple client connection requests at the same time, but there are some drawbacks:

  • System calls such as fork() make the process context switch less efficient

  • The number of process creation increases as the number of connection requests increases. For example, for 10W requests, fork 10W processes, which is too expensive

  • The address space between processes is private and independent, making it difficult to share data between processes

2. Multithreading

Thread is the logical flow that runs in the process context. A process can contain multiple threads. Multiple threads run in the same process context, so they can share all the contents of the address space of this process, which solves the problem of difficult communication between processes.

Also, since the context of a thread is much smaller than that of a process, a context switch between threads is much more efficient than a context switch between processes.

3. IO multiplexing

A server process can handle multiple socket descriptors simultaneously.

  • Multiplexing: Multiple client connections (connections are socket descriptors)

  • Reuse: It is possible to handle multiple client connections simultaneously using a single process

This is by increasing the number of processes and threads to process multiple sockets concurrently, which avoids context switching overhead, whereas IO multiplexing solves this problem by requiring only one process to handle multiple sockets.

Its development can be described in three stages: select->poll→epoll.

Select /poll/epoll

As usual, it is better to connect with our daily reality, so that you can understand.

For example:

Leaders assign development tasks to employees, and some employees haven’t completed them yet. If the leader wants each employee’s work to be checked, then the unfinished employee can only be blocked and waited for his completion before checking the next employee’s task, resulting in performance problems.

So how do you solve this problem?

1, select

For example:

The Leader will find a Team Leader (TL for short) to check the development tasks of each employee on his/her behalf.

Here’s what TL does: walk through and ask each employee “Is it done?” After CR check is correct, merge the completed items into Git branch. For other unfinished items, take a break and go to….

What’s the problem with that?

  • This TL has the ability deficiency problem, it can only manage 1024 employees at most

  • If many employees fail to complete their tasks in a short period of time, TL will continue to traverse and inquire, affecting efficiency.

The select function:

int select(int maxfdp1,fd_set *readset,fd_set *writeset,fd_set *exceptset,const struct timeval *timeout);
Copy the code

Copy the code

The file descriptors monitored by the select function are divided into three classes: Writefds, readFds, and ExcepTFds. The select function blocks until a descriptor is ready (with data readable, writable, or except), or a timeout (timeout specifies the time to wait, or null if returned immediately), and returns. When the select function returns, you can find the ready descriptor by iterating through the FDSet.

Select has good cross-platform support, with the disadvantage that there is a maximum limit to the number of file descriptors that a single process can monitor, typically 1024 on Linux.

2, poll

For example:

A New Team Leader (NTL) with more ability can manage more employees. This NTL can be referred to as poll.

The poll function:

intpoll(structpollfd*fds, nfds_t nfds,int timeout); typedef struct pollfd{ int fd; // The file descriptor short events to be detected or selected; // Events of interest on file descriptor fd short revents; // The current actual event on the file descriptor fd} pollfd_t;Copy the code

Copy the code

Poll changes the way the collection of file descriptors is described, using the pollfd structure instead of the Fd_set structure of SELECT, making the poll collection of file descriptors much more limited than select’s 1024.

3, epoll

For example:

Based on the poll-based NTL in the previous step, the method of doing things in NTL is improved: traversal all employees once, if the task is not completed, tell them to do xx operation after the task is completed (make some process specifications). In this case, the NTL only needs to periodically check the specified key nodes. That’s epoll.

Linux provides epoll related functions as follows:

intepoll_create(int size);
intepoll_ctl(int epfd,int op,int fd,struct epoll_event *event);
intepoll_wait(int epfd,struct epoll_event * events,int maxevents,int timeout);
Copy the code

Copy the code

Epoll is an improved poll in the Linux kernel for processing large numbers of file descriptors. It is an enhanced version of the Linux multiplexing I/O interface Select/Poll, which can significantly improve the system CPU utilization of a program with a large number of concurrent connections and only a small amount of activity.

4, summary

  • Select is polling, and on Linux the limit is usually 1024

  • Poll solves the select limit, but is still polling

  • Epoll solves the limitation of the number and also solves the polling method

Application of IO multiplexing in Redis

Redis server is an event driver. The server processes two types of events: time event and file event.

  • File event: In the main Redis process, the main processing client connection request and corresponding.

  • Time events: Child processes that are forked out, such as AOF persistence tasks.

Since Redis’s file events are a single-process, single-threaded model, but do maintain excellent throughput, IO multiplexing plays a major role.

A file event is an abstraction of a socket operation that is generated whenever a socket is ready to perform connect reply, write, read, close, and so on. Because a server typically connects to multiple sockets, it is possible for multiple file events to occur concurrently.

The IO multiplexer is responsible for listening for multiple sockets and passing those that generated the event to the file event dispatcher. The file event dispatcher receives the socket from the IO multiplexer and calls the appropriate event handler based on the type of event generated by the socket. An example is shown in the figure below:

The four components of the file processor

All the functionality of Redis’s IO multiplexer is done by wrapping the common IO multiplexer libraries of Select, Poll, evport, and Kqueue, each of which has a separate file in the Redis source code.

Redis implements the same API for each IO multiplexing library, so the underlying implementation of the IO multiplexing program is interchangeable. As shown in figure:

Multiple IO reuse library implementation is optional

Redis centrally manages all connections and read/write events, as well as time events we haven’t mentioned, and encapsulates the underlying I/O multiplexing mechanism so that a single process can handle multiple connections and read/write events. This is IO multiplexing in Redis.

Four,

After Redis 6.0, the multithreaded model was selectively used.

Redis chooses to use the single-threaded model to process client requests mainly because CPU is not the bottleneck of Redis server. The performance improvement brought by using the multi-threaded model cannot offset the development cost and maintenance cost brought by it. The performance bottleneck of the system is mainly in the network I/O operation.

The introduction of multi-thread operation in Redis is also due to performance considerations. For some large key-value pair deletion operations, the non-blocking memory space of multi-thread can also reduce the blocking time on the main thread of Redis and improve the efficiency of execution.

Everything can not be absolute, to find a moderate balance is the most important!

– END –

Author: Architecture improvement road, ten years of research and development wind and rain road, dacang architect, CSDN blog expert, focus on architecture technology precipitation learning and sharing, career and cognitive upgrading, adhere to the sharing of grounding gas dry articles, look forward to growing with you. Follow and private message I reply “01”, send you a programmer growth advance gift package, welcome to hook up.

The maximum value of the MySQL Varchar type is not the same as that of the MySQL Varchar type

Thanks for reading!