There is no distributed problem that can’t be solved by adding one node, if there is one, add another.

preface

Redis, as a high-performance in-memory database, is widely used in the mainstream distributed architecture system. In order to improve the fault tolerance rate of the system, the use of multi-instance Redis is also necessary, but the same complexity is much higher than the single-instance. This article will mainly introduce three kinds of Redis under the multi-machine database implementation.

A master-slave mode

Redis’ master-slave mode refers to master-slave replication.

Users can use the SLAVEOF command or configuration to have one server copy another as its slave server.

Master-slave pattern architecture

How does Redis implement master-slave mode?

The secondary server of Redis usually uses SYNC or PSYNC to initiate synchronization to the primary server.

For the first time synchronization

After receiving the SLAVEOF command, the slave server synchronizes data to the master server to enter the master/slave replication process.

  1. The secondary server initiates the SYNC or PSYNC command to the primary server

  2. The primary server executes the BGSAVE command, generates an RDB file, and uses the cache to record all write commands from now on

  3. When the RDB file is generated, the master server sends it to the slave server

  4. Load the RDB file from the server to synchronize your database state to the state of the master server when the BGSAVE command is executed.

  5. The master server sends all write commands for the buffer to the slave server, the slave server executes these write commands, and the database state is synchronized with the latest state of the master server.

Differences between SYNC and PSYNC

After the synchronization is complete, if the secondary server is down for a period of time, the primary server must be resynchronized after the online restart. The difference between SYNC and PSYNC is that the synchronization mode is repeated after the disconnection.

  • SYNC

    The secondary server sends the SYNC command to the primary server again. The primary server generates a snapshot of all data and sends it to the secondary server for data synchronization

  • PSYNC

    The secondary server re-initiates the PSYNC command to the primary server. The master server determines whether full resynchronization is required or only sends the write commands executed during the disconnection to the slave server based on the amount of data deviation between the two parties.

Obviously, PSYNC is much more efficient than SYNC, because synchronizing all data is a very resource-intensive operation (disk IO, network), and it is not worth it to synchronize all resources just because of a transient network instability. So Redis has been using PSYNC for replication since version 2.8

How does PSYNC implement partial resynchronization?

There are three main parts to achieve partial resynchronization

1. Record the replication offset

Both master and slave servers maintain a replication offset.

  • When the master server sends N bytes of data to the slave server, it increases its replication offset by +N.

  • When the master server receives N bytes of data from the master server, it also sets its own replication offset to +N.

When the master and slave data are synchronized, the offset is the same. And once one of them is disconnected from the server for a while and some data is missing. The offsets of the master and slave servers are not equal, and the difference is the number of bytes less transferred. If the amount of data undertransferred is not large enough to exceed the replication backlog buffer size of the primary server, the buffer contents will be sent directly to the secondary server to avoid full resynchronization. Otherwise, full resynchronization is required.

2. Copy the backlog buffer

The replication backlog buffer is a first-in, first-out byte queue maintained by the primary server, with a default size of 1MB. This data is stored in this queue whenever a write command is sent to the slave server. Each byte records its own copy offset. When reconnecting, the slave server sends its replication offset to the master server. If the data after the replication offset exists in the replication backlog buffer, only the data after the replication offset needs to be sent to the slave server.

3. Record the server ID

When a master/slave synchronization is performed, the master server sends its server ID(typically an automatically generated UUID) to the slave server. After the secondary server recovers from disconnection, it determines whether the ID is the primary server currently connected. If the ID is the same, the primary server has not changed. Partial resynchronization is attempted. If not the same ID indicates that the primary service has changed, it will fully resynchronize with the primary server.

The specific flow chart is as follows:

Redis Sentinel Mode (Sentinel)

Redis master-slave mode does a good job of data backup, but it is not highly available. Once the master server point is down, you can only manually switch the master server. Redis’ Sentinel mode is therefore a highly available solution to the master-slave mode.

The Sentinel mode introduces a Sentinel system to monitor the master server and all slave servers to which it belongs. Once a primary server is found to be down, the system automatically elects a secondary server to be upgraded to a new primary server for the purpose of fault escape.

Sentinel systems also need to be highly available, so they tend to cluster and monitor each other. Sentinel itself is a server that allows Redis in a special mode.

Realize the principle of

1.Sentinel establishes connections with the primary and secondary servers

  • When the Sentinel server starts, it creates a command connection on the primary server and subscribes to the primary server’s ** Sentinel: Hello channel to create a subscription connection **

  • Sentinel by default sends the INFO command every 10 seconds to the master server, which returns information about the master server itself, as well as information about all its slave servers.

  • Based on the returned information, the Sentinel server also creates command connections and subscription connections to the slave server if it detects a new slave server coming online, just as it does to the primary server.

2. Check whether the primary server is offline

Each Sentinel server sends a PING command every second to all instances it is connected to, including the master server, the slave server, and other Sentinel servers, to determine whether the instance is offline based on whether it replies to PONG.

Judge subjective referral

If the instance did not reply to the PING in down-after-milliseconds (according to the configuration). The instance will be considered subjective offline by Sentinel, which initiates the PING command.

Judge objective downline

In order to ensure that the primary server is really offline, Sentinel will confirm with other servers in the Sentinel cluster when the primary server is judged to be offline to a certain number of servers (generally N/2+1). The master server will be judged to be objectively offline and will need to be failover.

3. Election Lead Sentinel

When a primary server is deemed to have gone offline objectively, the Sentinel cluster elects a lead Sentinel server to failover the primary server. The whole election is implemented based on RAFT consistency algorithm and the general idea is as follows:

  • Each Sentinel that sees its master server go offline asks the other Sentinels to set itself up as a local lead Sentinel.

  • The received Sentinel can either agree or refuse

  • If a Sentinel is supported by more than half of the sentinels, it becomes the lead Sentinel in this election.

  • If the lead Sentinel is not elected within a given time, the election will be restarted after a period of time until the lead Sentinel is elected.

4. Elect a new primary server

The lead server selects the most suitable one from the list of services as the new master server. The rules of selection are:

  • Select healthy slave nodes and exclude disconnected slave servers that have not responded to INFO recently.

  • Select a secondary server with a higher priority

  • Select a server with a large replication offset (indicating the most complete data)

Once the new master server is selected, the master server will send the SLAVEOF no one command to the new master server to truly upgrade it to the master server and change the replication target of the other slave servers to make the old master server a slave server for failover.

Redis Cluster

Redis Sentinel mode achieves high availability and read/write separation, but it still has only one master node, that is, write operations are in the master node, which also becomes a performance bottleneck.

Therefore, After 3.0, Redis joined the Cluster mode, which is implemented by removing centerless nodes, and the Cluster will save the key-value pairs in the database through sharding

node

A Redis cluster consists of multiple nodes, each of which is interconnected and stores information about itself and other nodes. Nodes use the Gossip protocol to exchange status and information about newly added nodes.

The data of Sharding

The entire database of the Redis Cluster will be divided into 16384 hash slots, each key in the database belongs to one of the 16384 slots, each node in the Cluster can be 0 or up to 16384 slots.

Set slot assignment

CLUSTER ADDSLOTS

[slot…] Command we can assign one or more slots to a node.

For example, 127.0.0.1:7777> CLUSTER ADDSLOTS 12 3 4 5 is used to assign slots 1, 2, 3, 4, and 5 to node 7777.

After the slot assignment is set, the node will send the slot assignment information to other clusters for other clusters to update the information.

Compute which slot the key belongs to


def slot_number(key):
    return CRC16(key) & 16383

Copy the code

In fact, CRC16 algorithm is used to calculate the hash slot position, and then module 16383 to get the final slot.

You can also use the CLUSTER KEYSLOT

to view information.

Sharding process

  1. When a client initiates an operation on a key-value pair, it allocates the operation to any node

  2. The node calculates the slot to which the key belongs

  3. Check whether the current node is the slot to which the key belongs

  4. If so, directly execute the operation command

  5. If not, return the moved error to the client with the correct node address and port. The client can direct to the correct node after receiving the moved error

High availability of Redis Cluster

Each node of Redis can be divided into a master node and a slave node. The master node is responsible for processing slots, and the slave node is responsible for copying a master node and replacing the offline master node when the master node goes offline.

How is failover implemented

Similar to the Sentinel mode, each Redis node periodically sends Ping messages to each other to check whether the other node is online. When a node detects that another node is offline, it is set to suspected offline. If more than half of the nodes in a machine set a primary node to suspected offline, that node is marked offline and failover begins.

  1. The new master node is selected from the slave nodes of the offline master node by raft algorithm

  2. Run the SLAVEOF no one command on the selected secondary node to become the new primary node

  3. The new master undoes slots assigned by the offline master and points them to itself

  4. The new master node broadcasts itself to the cluster from slave to master

  5. The new master node begins to accept and handle requests for commands related to the slot itself

conclusion

This paper mainly introduces the Redis three cluster modes, to sum up

  • Master/slave mode can realize read/write separation and data backup. But it’s not “high availability.
  • Sentinel mode can be seen as a “high availability” version of master-slave mode, which introduces Sentinel to monitor the entire Redis service cluster. But with only one master node, there is still a write bottleneck.
  • Cluster mode not only provides a means of high availability, but also the data is stored in fragments in each node, which can support high concurrency write and read. Implementation, of course, is one of the most complicated.

The resources

  • Redis Design and Implementation
  • Data Intensive System Application Design