Redis master slave, Sentinel, Cluster all in one!

preface

Hello, everyone. I am a little boy picking up field snails. Today, I will learn the master and slave of Redis, Sentinel and Redis Cluster together with my friends.

Redis master-slave
Redis sentry
Redis Cluster Cluster
Public number: a boy picking up snails

1. Redis master-slave

Interviewers often ask about the high availability of Redis. Redis high availability answer includes two aspects, one is the data should not be lost, or to minimize the loss; Another is to ensure that the Redis service is not interrupted.

To minimize data loss, AOF and RDB guarantees are available.
In order to ensure that the service is not interrupted, Redis can not be a single point of deployment, at this time we first look at the Redis master slave.

1.1 Redsi master-slave concept

In the Redis master-slave mode, multiple Redis servers are deployed with master and slave libraries. Master and slave replication is performed between them to ensure the consistency of data copies.
The master library is in charge of read and write operations, while the slave library is in charge of read operations.
If the primary Redis library is down, switch the secondary library to the primary library.

1.2 Master/slave synchronization process of Redis

Redis master-slave synchronization consists of three phases.

Stage 1: Establish connection and negotiate synchronization between master and slave libraries.

Send from the slave library to the master librarypsyncCommand to tell it to synchronize data.

The main library receivedpsyncAfter command, respondFULLRESYNCCommand (which indicates that the first copy is takenFull amount of copy) and bring the main libraryrunIDAnd the current replication progress of the master databaseoffset.

Phase 2: The master database synchronizes data to the slave database. After the slave database receives data, the local load is completed.

The main library to performbgsaveCommand, generateRDBFile, and then send the file to the slave library. Received from the libraryRDB The RDB file will be loaded after the current database is emptied.

As the master synchronizes data to the slave, new writes are recordedreplication buffer.

In the third stage, the master sends the newly written command to the slave.

After the main library has finished sending the RDB, thereplication bufferThe changes are sent to the slave library, which then re-performs the actions. This synchronizes the master and slave libraries.

1.3 Some attention points of Redis master and slave

1.3.1 Inconsistency between Primary and Secondary Data

Because master-slave replication is asynchronous, if the slave library is executed late, data inconsistency between the master and slave can result.

There are two reasons for the inconsistency between master and slave data:

Master/slave library network delay.
The slave library received a master slave command, but it is executing a blocking command (e.ghgetall, etc.).

How to solve the inconsistency between master and slave data?

Better hardware configuration can be changed to ensure smooth network.
Monitor replication progress between master and slave libraries

1.3.2 Reading Expired Data

Redis has several strategies for deleting data:

Lazy deletion: Only when a key is accessed, the system checks whether the key has expired.
Periodic deletion: At regular intervals, a certain number of keys in the expires dictionary of a certain number of databases are scanned and expired keys are cleared.
Active deletion: The active deletion policy is triggered when the used memory exceeds the upper limit.

If you use Redis earlier than 3.2, the slave library will not determine whether the data is out of date, but will return expired data. However, after version 3.2, Redis has improved, if read data has expired, from the library will not be deleted, but will return a null value, avoiding the client to read expired data.

Therefore, in master-slave Redis mode, try to use Redis 3.2 or later.

1.3.3 Primary Database Pressure problem in Full Replication with One Master and Multiple Slaves

In the case of the mode of one master with many slaves, when there are many slave libraries, if each slave library needs to make a full copy with the master library, the master library will be under great pressure. Since the main library fork process generates the RDB, this fork process will block the main thread from processing normal requests. At the same time, transferring large RDB files can also take up the network bandwidth of the main library.

This can be resolved using the master-slave pattern. What is the master-slave pattern? In fact, when deploying a master/slave cluster, select a slave library with better hardware network configuration and let it establish a master/slave relationship with some slave libraries. As shown in figure:

1.3.4 What Do I Do If the Primary and Secondary Networks Are Disconnected?

After the master and slave libraries complete full replication, they maintain a long network connection, which is used for the master library to receive write commands and transfer them to the slave library. This avoids the overhead of establishing frequent connections. However, if the network is disconnected and reconnected, do YOU need to perform a full replication?

If it was prior to Redis 2.8, a full copy would indeed be made again after reconnecting the slave and master libraries, but this would be expensive. After Redis 2.8 has been optimized, incremental replication is adopted after reconnection, that is, write commands received by the master library during the disconnection between the master and slave libraries are synchronized to the slave library.

Once the master and slave libraries are reconnected, repl_backlog_buffer is used for incremental replication.

When the master and slave libraries are disconnected, the master library writes the write commands received during the disconnection to the Replication buffer, as well as to the repl_backlog_buffer buffer. Repl_backlog_buffer is a circular buffer, where the master library records where it wrote and the slave library records where it read.

2. Redis sentry

In the master-slave mode, once the master node fails to provide services due to failure, the slave node needs to be promoted to the master node manually and the application needs to update the address of the master node. Obviously, this approach to troubleshooting is unacceptable for most business scenarios. Redis has officially provided the Redis Sentinel mechanism since 2.8 to solve this problem.

The sentinel role
Introduction to Sentinel Mode
How does sentry determine if the main library is offline
How does Sentinel mode work
How does the sentry choose the Lord
Which sentry performs the master/slave switch?
Failover under sentry

2.1 Sentry function

Sentry is actually a Redis process running in a special mode. It has three functions, respectively: monitoring, automatic master switch (referred to as master), notice.

The Sentry process monitors all Redis master and slave nodes while it is running. It periodically sends the PING command to the master and slave libraries to detect whether the master and slave libraries are down. If the slave library does not respond to the sentinel’s PING command within the specified time, the sentinel marks it offline. If the primary library does not respond to the sentry’s PING command within the specified time, the sentry determines that the primary library is offline and switches to the primary task.

The so-called master selection, in fact, is from many from the library, according to certain rules, to choose a master library. As for notification, after the master library is selected, the sentry sends the connection information of the new master library to other slave libraries, allowing them to establish a master-slave relationship with the new master library. At the same time, sentry notifies clients of the connection information of the new master library, and allows them to send requested operations to the new master library.

2.2 Sentinel Mode

Because Redis sentry is also a Redis process, if it dies by itself, it will not be monitored. Let’s take a look at the Redis Sentinel mode

Sentinel mode is a sentinel system composed of one or more sentinel instances. It can monitor all the primary and secondary nodes of Redis, and automatically upgrade a secondary node under the offline primary server to a new primary node when the monitored primary node goes offline. , a sentinel process monitoring the Redis node can cause problems (single point problems). Therefore, Redis nodes are typically monitored by multiple sentinels, and between sentinels.

In fact, sentinels form a cluster through the publish and subscribe mechanism. At the same time, sentinels obtain the connection information of the slave library through the INFO command, and can also establish a connection with the slave library for monitoring.

2.3 How does sentry determine that the master library is offline

How does sentry determine if the main library is offline? Let’s start with two basic concepts: subjective and objective logoff.

The sentinels send PING commands to both the master and slave libraries, and if the master or slave does not respond within the specified time, the sentinels mark it as a subjective offline.
If it is the master library that is marked as a subjective offline, all the sentinels that are monitoring the master library confirm once per second that the master library has actually gone subjective offline. The master library is marked as objective offline when a majority of sentinels (usually majority by majority, a value set by the Redis administrator) confirm within a specified time frame that the master library has indeed gone subjective offline. The purpose of this is to avoid misjudgment of the master library, so as to reduce unnecessary master/slave switching and reduce unnecessary overhead.

Assuming we have N sentinel instances, if there are N/2+1 instances to judge the subjective offline of the master library, then the node can be marked as objective offline and the master slave switchover can be made.

2.4 Working mode of sentry

Each sentinel sends one every second to the master, slave, and other sentinel instances it knows ofPINGCommand.
If an instance node is close to its last valid replyPINGCommand time exceededdown-after-millisecondsOption, the instance is marked as subjective offline by the sentinel.
If the master library is marked as subjective offline, all sentinels monitoring the master library confirm that the master library is indeed in the subjective offline state at a frequency of once per second.
The master library is marked as objective offline when a sufficient number of sentinels (greater than or equal to the value specified in the configuration file) confirm within the specified time frame that the master library has indeed gone subjectively offline.
When the primary library is marked as objective offline by the sentinel, it enters primary mode.
If not enough sentinels agree that the master library has entered the subjective offline, the master library’sThe subjective offline status is removed; If the master library is redirected to the sentryPINGWhen the command returns a valid reply, the subjective offline status of the master library is removed.

2.5 How does the sentry choose the Lord?

If it is clear that the master library has been taken offline objectively, Sentry starts the master selection mode.

Sentry selection involves two processes: filtering and scoring. In fact, in a number of slave libraries, first according to certain screening conditions, do not meet the conditions from the library filter out. Then, according to certain rules, score the remaining slave libraries one by one, and choose the slave library with the highest score as the new master library

When selecting the master, it will judge the status of the slave library, and if it is offline, it will filter directly.
If the slave library network is not good, always timeout, will be filtered out. Look at this parameterdown-after-milliseconds, which represents the maximum connection timeout that we consider to be disconnected from the master and slave libraries.
Once you have filtered out the slave libraries that are not suitable for the master library, you can score the remaining slave libraries based on three rules: slave priority, slave replication progress, and slave LIBRARY ID number.
From the highest priority of the library, the higher the score, the priority can passslave-priorityConfiguration. If the priority is the same, select the slave library that replicates fastest with the old master. If the priority is the same as the progress of the slave library, the smaller the slave library ID is scored higher.

2.6 Which sentry performs master/slave switchover?

Once a sentinel marks the master library as a subjective offline, it asks the other sentinels to confirm that the master library is indeed in a subjective offline state. It sends the is-master-down-by-addr command to the other instance sentries. The other sentinels respond to either Y or N (Y means yes, N means no), depending on their connection to the main library. If the sentinel obtains a sufficient quorum configuration, the master library is marked as objective offline.

The sentinel, which marks the main library objective offline, then sends a command to the other sentinels to vote for it to perform the master/slave switch. This voting process is called Leader election. Because the sentry who ultimately performs the master/slave switch is called the Leader, the voting process is to determine the Leader. A sentry who wants to become a Leader needs to meet two conditions:

Need to getnum(sentinels)/2+1Is voted for.
The number of votes received must be greater than or equal to that in the Sentinel profilequorumValue.

For example, suppose there are three sentinels. The configured quorum value is 2. That is, a sentry who wants to become a Leader needs at least 2 tickets. And just to make sense of it, you can look at it

At time T1, sentry A1 judged that the master library was offline objectively and wanted to be the Leader of the master-slave switchover, so it cast a vote for itself first and then issued voting orders to sentry A2 and A3 respectively, indicating that it wanted to be the Leader.
At t2, A3 judges that the main library is objectively offline and wants to be the Leader. Therefore, it also votes for itself first, and then initiates voting orders to A1 and A2 respectively, indicating that it also wants to be the Leader.
At time T3, Sentry A1 receives the Leader vote request from A3. Since A1 had already voted Y for himself, he could not vote for the other sentries, so A1 votedNA3.
At time T4, sentry A2 receives the Leader vote request from A3, and since sentry A2 has not voted before, it replies yes to the first sentry who sent it a vote requestY.
At moment T5, Sentry A2 received the vote request from A1’s Leader. Since sentry A2 had voted for A3 before, it could only vote against A1N.
At the end of t6, Sentry A1 received only one of his own votesYYes, while Sentinel A3 received two yes votes (A2 and A3 votes), soSentry A3 becomes Leader.

If for some reason, such as network failure, sentry A3 also does not receive two votes, then this round of voting will not produce a Leader. The sentry cluster waits a certain amount of time (typically twice the sentry failover timeout) before re-electing.

2.7 Failover

Suppose the sentinel pattern architecture is as follows, with three sentinels, one master library M, and two slave libraries S1 and S2.

When sentry detects a failure in the Redis main library M1, it needs to failover the cluster. Suppose sentry 3 is chosen as Leader. The failover process is as follows:

Remove the slave node from S1 and upgrade to the new master library
Slave S2 becomes the slave of the new master
The original master node recovers and becomes the slave node of the new master library
Notify the client application of the address of the new master node.

After failover:

3. Redis Cluster Cluster

Sentinel mode is based on master/slave mode to achieve read/write separation, and it can automatically switch, higher system availability. But it stores the same amount of data per node, wastes memory and is not easy to expand online. Therefore, The Reids Cluster Cluster (the implementation of the sliced Cluster) came into being, which was added in Redis3.0 to realize the distributed storage of Redis. Data is sharded, that is, different content is stored on each Redis node to solve the problem of online capacity expansion. Moreover, it can store large amounts of data, that is, spread it across Redis instances, and also provide replication and failover capabilities.

For example, if a Redis instance stores 15GB or more data, the response will be slow. This is due to the Redis RDB persistence mechanism. Redis will fork the child process to complete the RDB persistence operation, and the fork execution time is positively correlated with the Redis data volume.

At this point, it’s tempting to just spread out 15 gigabytes of data and store it. This is the original intention of Redis slice cluster. What is a sliced cluster? For example, if you want to store 15GB of data in Redis, you can use a single instance of Redis, or 3 instances of Redis to form a sliced cluster, as shown below:

The difference between slicing Cluster and Redis Cluster: Redis Cluster is a solution to implement slicing Cluster officially provided by Redis3.0.

Since data is fragmented among different Redis instances, how does the client determine which instance it wants to access data on? Let’s take a look at how Reids Cluster works.

3.1 Hash Slot

The Redis Cluster solution uses Hash slots to handle the mapping between data and instances.

A slice cluster is divided into 16,384 slots. Each key/value pair entering Redis is hashed according to key and assigned to one of the 16,384 slots. The hash mapping used is also relatively simple, using THE CRC16 algorithm to calculate a 16bit value, and then take modulus of 16384. Each key in the database belongs to one of the 16,384 slots that can be handled by each node in the cluster.

Each node in the cluster is responsible for A part of the hash slots. Assuming that the current cluster has A, B, and C3 nodes, and the number of hash slots on each node =16384/3, one possible allocation exists:

Node A is responsible for hash slots 0 to 5460
Node B is responsible for hash slots 5461 to 10922
Node C is responsible for hash slots 10923 to 16383

What happens when a client sends a data read or write operation to a Redis instance and there is no corresponding data on the instance? MOVED redirection and ASK redirection

3.2 MOVED and ASK Redirection

In Redis Cluster mode, nodes process requests as follows:

Check whether the current Redis key has the current node through the hash slot mapping
If the hash slot is not owned by its own node, return the MOVED redirect
If the hash slot is responsible for itself and the key is in the slot, the result for the key is returned
If a Redis key does not exist in a MIGRATING slot, do you check whether the slot is MIGRATING?
If the Redis key is migrating, an ASK error is returned to redirect the client to the destination server
If the hash slot is not moved out, check whether the hash slot is imported.
If the hashed slot is imported and the ASKING flag is displayed, perform the operation directly. Otherwise, return to the MOVED redirect

3.2.1 Moved Redirection

When a client sends a read/write operation to a Redis instance, it will return a MOVED redirection error if the calculated slot is not on the node. In the MOVED redirection error, it will bring back the IP address and port of the new instance where the hash slot is. This is the Redis Cluster’s MOVED redirection mechanism. The flow chart is as follows:

3.2.2 ASK Redirection

Ask redirection typically occurs when the cluster is scaling. Cluster scaling leads to slot migration, and when we go to the source node to access the data, it may already be migrated to the target node. Using Ask redirection can solve this situation.

3.3 Cluster Communication protocols of Cluster nodes: Gossip

A Redis cluster consists of multiple nodes. How do the nodes communicate with each other? Through the Gossip protocol! Gossip is a rumor spreading protocol. Each node periodically selects K nodes from the node list and transmits the information stored on this node until the information of all nodes is consistent, that is, the algorithm converges.

The basic idea of the Gossip protocol is that a node wants to share information with other nodes in the network. It then periodically selects random nodes and passes information to them. The nodes that receive the information then do the same thing, passing the information on to some other randomly selected node. In general, information is periodically delivered to N target nodes, not just one. This N is called fanout

Redis Cluster Clusters communicate with each other through the Gossip protocol. Nodes constantly exchange information, including node faults, new nodes, changes of primary and secondary nodes, and slot information. The Gossip protocol contains multiple message types, including ping, pong, meet, fail, and so on

Meet message: notifies the new node to join. The message sender informs the receiver to join the current cluster. After the normal completion of meet message communication, the receiving node will join the cluster and exchange ping and Pong messages periodically.
Ping message: A node sends a ping message every second to other nodes in the cluster. The ping message contains the known addresses, slots, status information, and last communication time of the two nodes
Pong message: When ping and meet messages are received, a reply message is sent to the sender to confirm the normal communication of the message. The message also contains information about the two nodes it already knows.
Fail message: When a node determines that another node in the cluster goes offline, it broadcasts a Fail message to the cluster. After receiving the fail message, other nodes update the status of the node to offline.

In particular, each node communicates with other nodes through a cluster bus. For communication, use a special port number, that is, the external service port number plus 10000. For example, if the port number of a node is 6379, the port number used to communicate with other nodes is 16379. Communication between Nodes uses a special binary protocol.

3.4 Failover

Redis cluster achieves high availability. When a node in the cluster fails, failover is implemented to ensure that the cluster can provide services normally.

The Redis cluster realizes fault discovery through ping/pong messages. This environment includes both subjective and objective loggings.

Subjective offline: a node considers another node unavailable, that is, offline status. This status is not the final fault judgment, but only represents the opinion of one node, and misjudgment may occur.

Objective offline: Refers to the actual offline of a node. Multiple nodes in the cluster consider the node unavailable and reach a consensus. If the primary node that holds the slot is faulty, perform failover for the node.

If node A marks node B as subjective offline, after A period of time, node A sends the status of node B to other nodes through messages. When node C receives the message and parses the message body, if the PFAIL status of node B is found, the objective offline process will be triggered.
When the primary node goes offline, the Redis Cluster collects the votes of the primary node holding the slot to see whether the number of votes reaches half. When the number of votes in the offline report is more than half, it is marked as the objective offline state.

The process is as follows:

Fault recovery: After a fault is discovered, if the offline node is the primary node, replace it with a secondary node to ensure high availability of the cluster. The process is as follows:

Qualification check: Check whether the secondary node meets the requirements to replace the faulty primary node.
Election preparation time: Update the trigger election time after the qualification check is passed.
Call an election: When the time is up, call an election.
Election vote: Only the master node that holds the slot has votes. Enough votes (more than half) are collected from the node to trigger the operation of replacing the master node

3.5 Extra Meals: Why is Hash Slot 16384 in Redis Cluster?

For the client request to come over the key value, hash slot =CRC16(key) % 16384, CRC16 algorithm generated hash value is 16bit, according to the truth of the algorithm can produce 2^16=65536 values, why not 65536, with 16384 (2^14)?

You can read the author’s original answer:

Redis stores slots for each instance node, which is one unsigned char slots[REDIS_CLUSTER_SLOTS/8]type

When the Redis node sends heartbeat packets, it needs to put all slots into this heartbeat packet if the slots number is65536 , takes up space =65536/8 (one byte 8bit) / 1024(1024 bytes 1kB) =8kBIf you use slots, the number is16384 , the space occupied by =16384/8 (8 bits per byte) / 1024(1024 bytes 1kB) = 2kBSo 16384 slots saves about 6kB of memory compared to 65536. If a cluster has 100 nodes, that’s 600kB per instance
In general, the number of primary nodes in Redis cluster cannot exceed 1000, which will lead to network congestion. For a Redis cluster with less than 1000 nodes, 16,384 slots are sufficient.

To save on memory network overhead, why not use 8192 (16384/2) at slots?

8192/8 (8bit per byte) / 1024(1024 bytes 1kB) = 1kB, only 1kB! We can start by looking at the way Redis converts keys to slots

unsigned int keyHashSlot(char *key, int keylen) {
    int s, e; /* start-end indexes of { and } */

    for (s = 0; s < keylen; s++)
        if (key[s] == '{') break;

    /* No '{' ? Hash the whole key. This is the base case. */
    if (s == keylen) return crc16(key,keylen) & 0x3FFF;

    /* '{' found? Check if we have the corresponding '}'. */
    for (e = s+1; e < keylen; e++)
        if (key[e] == '}') break;

    /* No '}' or nothing betweeen {} ? Hash the whole key. */
    if (e == keylen || e == s+1) return crc16(key,keylen) & 0x3FFF;

    /* If we are here there is both a { and a } on its right. Hash
     * what is in the middle between { and }. */
    return crc16(key+s+1,e-s-1) & 0x3FFF;
}
Copy the code

Redis converts the key to slots by adding crC16 (key) to the number of slots

Why is it 0x3FFF(16383) instead of 16384? Because x % of 2 to the n without overflow is the same thing as x & 2 to the n minus 1, which is x % 16384 == x & 16383

So why not use 8192 anyway?

Crc16 comes out with a theoretical probability of 1⁄65536, but the actual probability of repetition may be much higher than that, just as CRC32 comes out with a theoretical probability of 1 in 4 billion, but the actual probability of 100,000 collisions is much higher. If slots is set to 8192, with 200 instance nodes, the theoretical value is that every 40 different key requests will fail. If the number of nodes increases to 400, that’s 20 requests. In addition, 1KB does not save much than 2K, and the cost performance is not particularly high, so 16384 May be more common

Shoulders of giants (Reference & Thanks)

Redis Core Technology and Practice by Geek Time
Redis Advanced – High extensibility: Redis Cluster in detail
Why is the number of Redis slots 16384
Public number: a boy picking up snails