Thinking and summing up about Redis

Redis is a complex and well-designed system, complex because the whole system involves many issues, such as: hash storage, network model, clustering characteristics and so on. It is well designed because it provides thoughtful solutions to these problems.

We spend a lot of time learning a technology, not only in order to better use it, but also in the hope of learning some ideas of its design, so that we can solve various problems in daily work with a broader mind. Here are a few thoughts and summaries of Redis’ internal mechanics.

Slot mechanism

Slot mechanism o&M benefits

Redis does not support clustering function in the early stage. If you need to do data sharding, you need to use simple hash or consistent hash on the client side. The cluster running mode is as follows:

In this way, the cluster is simple and almost infinite linear expansion. However, there is a fatal problem. Rehash operations are required for cluster capacity expansion or reduction, that is, Rehash existing data. Otherwise, some old data cannot be accessed.

Redis Cluster has designed a Slot mechanism to solve this problem. The Redis Cluster has 16,384 fixed slots in which the entire data set in the Cluster is hashed. Slot is a combination of data, denoted by codes ranging from 0 to 16383. If a data hash value modulo 16384 equals a Slot’s code name, it should be stored in that Slot. At the same time, each node of the Redis Cluster maintains a slot-to-node mapping table, so the Cluster runs as follows:

Slot is designed to solve the Rehash problem mentioned above. When the Redis Cluster needs to be expanded or reduced, data in the Slot needs to be migrated to a new node and the Slot needs to be assigned to the new node. Because data in the same Slot is internally maintained in a hop table, it is easy to design interfaces to do this. Users can easily perform capacity expansion and reduction operations by invoking corresponding interfaces.

To summarize: Redis addresses the complexity of hash-based cluster capacity expansion and reduction operations and their impact on online clusters by introducing Slot mechanisms and designing interfaces to migrate slots.

Concurrency benefits of Slot mechanism

The Slot mechanism also has an obvious advantage in dealing with concurrent scenarios, because it splits the data set and actually reduces the granularity of locks, thereby increasing concurrency. The ConcurrentHashMap container in Java is a typical example of using this mechanism to implement concurrency, and we use it as an example to illustrate the benefits of Slot for concurrent scenarios.

Those of you who know Java know that ConcurrentHashMap is a thread-safe hash container, but we won’t focus on the details of its underlying concurrency implementation, whether it’s CAS or locking. Let’s focus on Segment. The Segment is the Slot in the ConcurrentHashMap. For naming purposes, we still call it Slot.

When ConcurrentHashMap is initialized, the parallelism must be specified. The parallelism is actually the number of slots. Like Redis, the Slot number cannot be changed once determined. Slot is also responsible for a portion of the data in the entire data set, and it also has a concurrency mechanism, so access to data within the same Slot must comply with the concurrency mechanism. Because the concurrency mechanism is not for the entire data set, but for each Slot with its own concurrency control, the maximum number of concurrent slots in a container is the number of slots instead of one. This technique is often referred to as segmented locking. ConcurrentHashMap works like the following:

Careful friends may notice that each Slot in ConcurrentHashMap has its own separate hash table, and the data is physically isolated. In Redis, the entire dataset uses the same hash table. Although Redis is a single-threaded model, the data set does not need concurrency mechanism, but inspired by the common data set, ConcurrentHashMap fragment locking mechanism can also have the following solutions:

In the above scenario, Slot is a more virtual concept. Instead of a separate hash table, all slots share a hash table and only provide synchronization. When the request comes in, the key is first hashed to the Slot to gain access, and then hashed again to the hash table to access the data. Similarly, once the number of slots is determined, it cannot be changed.

Single threaded model

Redis is perhaps the most unique of all server-side programs in its single-threaded model. The biggest advantage of the single-threaded model is its simplicity of implementation, but it also has a significant disadvantage: requests are easily blocked by other operations.

Those of you who have studied operating systems know that in the early days, a single core CPU could also handle multiple tasks at the same time. The way is to divide the task into many small segments, and the CPU constantly switches to perform multiple tasks, because the execution time of each small segment is always short enough. From the perspective of users, multiple tasks are executed at the same time.

The Redis single-threaded model also handles multitasking in this way. For tasks that take longer to execute, Redis cuts them up into smaller chunks to reduce the AMOUNT of CPU time consumed, thereby reducing the blocking time for front-end requests. These tasks with long execution time mainly include:

The front-end request

Redis adopts an event-oriented processing model for requests. The basic idea is to divide each request into multiple events, each event is the smallest unit of processing, and the multiple events of each request can be discontinuous processing. The same thread is reused for multiple event processing.

Data date

Redis adopts both active and passive strategies for data expiration. The active approach is to scan the expiration table periodically and process the expired data. However, considering that there may be a lot of expired data and it may take a long time to complete the processing, Redis adopts the approach of processing only a small part of the data at a time.

Rehash operations

When the hash factor of the hash table reaches a certain threshold, the Rehash operation is required to reduce the conflict rate. Redis’s Rehash mechanism works like this: first create a new hash table, then migrate the data. Because overall data migration is an extremely time-consuming operation, Redis hashes the data migration operation to each data request processing, such as the front-end request key, and migrates it to the new hash table if the key is found to be in the old hash table.

RDB file transfer

At the sending end, the network transmission of big data files is segmented itself, and its segmented size is limited by the size of each buffer. The receiving end is also limited by buffer, so the receiving end is also part of the receiving end. So the RDB file transfer is segmented at both ends and can handle other tasks in the middle. However, in the whole transmission phase, the transmission task will occupy a lot of resources, and has an impact on the front-end request.

Pipeline limit

Let’s take the example of a Courier delivering packages. Suppose the Courier takes 2 hours to deliver a package, 119 minutes of which is spent on the road, and it takes 1 minute to hand over the package. Now he has 10 packages to deliver to the same neighborhood. It would take him 20 hours if he picked up one package at a time. But if he picks up 10 packages at a time, it only takes a little more than two hours.

For Redis request, a client needs to wait for the server to reply to the previous request before sending the next request. Most of the time of the whole request process is spent on network transmission. However, the overall response time of a large number of requests can be greatly reduced if multiple commands are transmitted by pipeline.

However, too many commands at one time can block requests from other clients for the server, so the number of commands sent by the pipeline needs to be limited.

summary

The single-threaded model avoids the tedious process of handling concurrency, but requires special attention to the piecewise processing of long operations to avoid blocking front-end requests.

Message version number mechanism

Before reading this section, you should have some knowledge of Redis clustering features and the Raft protocol. Because clustering itself is very complex, need a certain theoretical basis to better understand. At the same time because my level is limited if there is a mistake welcome correction.

Redis Cluster is a kind of centrless Cluster, each node has a copy of the Cluster state, also each node can modify the Cluster state. Cluster status Synchronization is achieved by exchanging cluster messages between nodes. When a node receives a cluster message, it needs to update the group state of the set itself if the message is relatively new. There are two possible questions at this point:

  • Cluster status, what are cluster messages?
  • How do you determine if the received cluster message is new?

Cluster status and cluster messages

Let’s start with the first question: cluster status and cluster messages. In this article, cluster status mainly refers to the mapping between slots and nodes in a cluster. Cluster messages refer to Slot information that a node is responsible for. For a node, the message it sends is only the Slot information it is responsible for, because only it knows best which slots it is responsible for. For a slave node, it sends the Slot information of its master node.

In fact, in Redis, cluster messages not only contain the Slot information that the sending node is responsible for, but also include the master/slave information of the sending node and the status information of other nodes that the sending node considers (FAIL/PFAIL). The parts that are not relevant to the topic are omitted for better illustration.

How do I determine if a received cluster message is new

The second problem is how the node determines that the received cluster message is relatively new. Redis uses the version number mechanism to determine whether a message is old or new. The higher the version number, the newer the message.

In Redis this version number is called configEpoch, and each node has a configEpoch, which is a 64-bit integer. Instead, configEpoch should be called the version number of a Slot, or the version number of a Slot’s mapping to a node. How do you understand that?

Let’s look at this in terms of message collisions, because that’s what configEpoch is for. As mentioned above, each node in the cluster maintains a snapshot of the cluster state. Each Slot in the snapshot has a node corresponding to it, and each node has a configEpoch, which intuitively looks like the version number of the Slot.

When a Slot changes, the node updates its configEpoch and propagates the message along with the cluster heartbeat. The cluster message is a triplet of slots, nodes, and configEpoch. When a new cluster message is propagated to another node, the node determines whether to update the snapshot of the local cluster state by comparing the node with the configEpoch of the corresponding Slot in the cluster message. In this sense, configEpoch is also better known as the Slot version number or message version number.

The following figure shows the cluster status update process for Slot migration:

In the scenario above, there are three nodes ABC. They start with configEpoch 8000\9000\10000 and Slot 100 is taken care of by node C. When Slot 100 is migrated from node C to node B, the configEpoch of node B changes to 10001, and node B sends a message saying that Slot 100 is its responsibility. When node AC receives a message from B, it compares the configEpoch because B’s configEpoch is large and updates the local cluster state.

When Slot B changes, the configEpoch of node B changes to 10001, which is the largest configEpoch in the cluster. Why do I need to Max it out instead of adding one? If the configEpoch of Slot 100 on node C is 10000, the AC update will fail if the configEpoch of Slot 100 on node B is 9001, which is less than 10000.

From the above example, we can see that configEpoch is used to determine whether the message is old or new. The key is how to add configEpoch to ensure that the configEpoch of the latest message is unique and the largest in the cluster. There are two ways to add configEpoch in Redis:

CurrentEpoch way

Each node in Redis maintains a currentEpoch, which acts like a mediator. CurrentEpoch is added when the cluster status changes, and the largest currentEpoch is continuously propagated when the nodes exchange messages. Redis tries to keep currentEpoch identical for each node and tries to keep currentEpoch greater than or equal to the largest configEpoch in the cluster. Whenever you need to add configEpoch, do the following:

configEpoch = currentEpoch + 1
Copy the code

The new configEpoch is somewhat guaranteed to be the only and largest in the cluster. In the example above, node B migrates into the new Slot to increase the configEpoch of node B in this way, but the currentEpoch of each node is not drawn in the figure above.

voting

The voting method is mainly used in FAILOVER scenarios. The FAILOVER algorithm of Redis is similar to the master selection part of Raft protocol. The slave node sends a vote request to the master node in the cluster (note that there may be multiple slave nodes making requests). The first secondary node to receive a majority vote can be promoted to the new master node.

The voting method for adding configEpoch is the same formula as in the previous section. After adding currentEpoch to configEpoch, the slave node does not directly assign a currentEpoch value to configEpoch. Instead, it queries other primary nodes to determine whether its currentEpoch is the largest in the cluster. The currentEpoch value will be assigned only when more than half of the primary nodes vote for the currentEpoch; otherwise, the promotion of the slave node fails.

One of the criteria that hosts use to decide whether they should vote for currentEpoch is related to currentEpoch. The primary node compares itself to the currentEpoch that issues the vote from the slave node, and does not vote for the currentEpoch if its value is large, because it indicates that the slave node’s cluster state is very old.

However, since not all nodes vote for this method, this method also does not guarantee that its configEpoch will be unique and the largest in the cluster, but it is more strict than the previous method.

Message conflict

Redis always tries to maintain the principle that the configEpoch of each node should be unique in the cluster, and that the configEpoch of the node that changed later should be greater than the configEpoch of the node that changed before. This ensures that each message has a unique configEpoch, and that the configEpoch of the subsequent message is larger than the configEpoch of the previous message, thus ensuring that the most recent message can be updated.

Let’s look at the following event and message processing flow, which is a typical scenario for Redis message conflicts:

The general process described in the figure above is that after node A moves into A Slot, the master/slave switch occurs before the message is transmitted to node C and C’. After the switch, the configEpoch of node A and C’ is the same. This phenomenon is the version number conflict in Redis, also known as message conflict.

If Slot migration occurs between nodes A and C’ after A version conflict occurs, it is assumed that Slot 100 will be migrated from C’ to A. Since THE configEpoch of A is already the largest configEpoch in the cluster, the configEpoch does not need to be added. The message of Slot 100 belongs to both versions A and C’, and the configEpoch of both versions is the same, so the message update of the migration failed.

To resolve the conflict, Redis stipulates that if the configEpoch of two nodes is found to be the same when messages are exchanged between nodes, the configEpoch of one node should be changed by configEpoch=currentEpoch+1. Because the message has already been processed, it is ok to change the configEpoch on either node at this point.

Conflict resolution relies on exchanging heartbeat packets. Before the two nodes exchange heartbeat packets, the setslot command is sent TO the FROM and TO nodes. Although the setslot command does not change the configEpoch of the TO node, it still changes the mapping between Slot and the node. In this way, even if the FROM node’s configEpoch is increased when the conflict is resolved, the new message is updated successfully because it should not be responsible for the Slot that was migrated.

summary

The timing of messages in distributed systems cannot be guaranteed. Adding a version number to each message is the most intuitive way to resolve conflicts, which is also used in many distributed scenarios. However, it is simple to say but complicated to implement in reality.

Write in the last

The above is my summary and thinking in the process of using Redis. Meanwhile, I hope I can accumulate more to broaden my mind and better solve the problems encountered in work.

Copyright notice: the article is the result of the author’s hard work, please indicate the author and source.