introduce

In some scenarios, we need to store the same data on different nodes, so that when one node is unavailable, the other node copies can continue to provide service and improve stability. On the other hand, the same data is stored on different nodes, and the write node and the read node are separated (read and write separation), which can improve performance.

So data is stored on multiple nodes, how to ensure data consistency between nodes?

Redis adopts the master/slave replication mode to ensure consistency. That is, among all nodes, a master node provides write service externally, and then data is asynchronously copied from the master node to other nodes (slave) internally.

Active/standby replication is implemented

In Redis, users can make REPLICAOF data from the primary server by using the REPLICAOF command (also known as the SLAVEOF command) or by setting the REPLICAOF option to the secondary server. Let’s take a look at both of these, and then look at the implementation.

Two trigger modes

REPLICAOF command (SLAVEOF)

The client executes REPLICAOF from the server as follows:

REPLICAOF <masterip> <masterport>

Let’s simulate this command locally. The local Redis version is 5.0.3. Open two Redis instances on port 6379 and port 6380. REPLICAOF 127.0.0.1 6379:

127.0.1:6380 > REPLICAOF 127.0.0.1 6379 17144:S 10 Oct 2021 09:03:17.792 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer. 17144:S 10 Oct 2021 09:03:17.792 * REPLICAOF 127.0.0.1:6379 Enabled (user request from 'id=8 addr=127.0.0.1:60790 fd=7 name= age=437 IDLE =0 FLAGS =N db=0 Sub =0 psub=0 multi=-1 qbuf=44 qbuf free=32724 obl=0 OLl =0 omem=0 events=r CMD =replicaof') OK 127.0.0.1:6380> 17144:S 10 Oct 2021 09:03:18.117 * Connecting to MASTER 127.0.0.1:6379 17144:s 10 Oct 2021 09:03:18.117 * MASTER <-> REPLICA Sync Started 17144:S 10 Oct 2021 09:03:18.118 * Non blocking connect for SYNC fired the event. 17144:S 10 Oct 2021 09:03:18.118 * Master to PING, replication can continue... 17144:S 10 Oct 2021 09:03:18.118 * Trying a partial resynchronization (request 82 d6eca4120c2c3308a12dc6f601a356c10d4d45:11772). 17228: M 10 Oct 2021 09:03:18. 119 * up 127.0.0.1:6380 asks for Synchronization 17228:M 10 Oct 2021 09:03:18.119 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '82d6eca4120c2c3308a12dc6f601a356c10d4d45', my replication IDs are 'adc7e6a0c9933e21844beefb24ccc7fd5f901e32' and '0000000000000000000000000000000000000000') 17228:M 10 Oct 2021 09:03:18.119 * Starting BGSAVE for SYNC with target: Disk 17228:M 10 Oct 2021 09:03:18.119 * Background saving started by PID 21164 17144:S 10 Oct 2021 09:03:18.120 * Full resync from master: Adc7e6a0c9933e21844beefb24ccc7fd5f901e32:11771, 17144:10 Oct 2021 S 09:03:18. 120 * Discarding previously cached master State.21164 :C 10 Oct 2021 09:03:18.122 * DB saved on disk 17228:M 10 Oct 2021 09:03:18.222 * Background saving Terminated with success 17228:M 10 Oct 2021 09:03:18.222 * Synchronization with replica 127.0.1:6380 Succeeded 17144:S 10 Oct 2021 09:03:18.222 * MASTER <-> REPLICA sync: Receiving 234 bytes from master 17144:S 10 Oct 2021 09:03:18.223 * master <-> REPLICA sync: Flushing old data 17144:S 10 Oct 2021 09:03:18.223 * MASTER <-> REPLICA sync: Loading DB in memory 17144:S 10 Oct 2021 09:03:18.223 * MASTER <-> REPLICA sync: Finished with successCopy the code

Above is a print of the console after REPLICAOF. The printed information shows the general synchronization process.

Set the replicaof option

Replicaof = replicaof = replicaof = replicaof = replicaof

127.0.0.1:6380> REPLICAOF no one 17144:M 10 Oct 2021 11:41:09.395 # Setting secondary replication ID to adc7e6a0c9933e21844beefb24ccc7fd5f901e32, valid up to offset: 18492. A New replication ID is 0 c18c8ae51afbaf9aac438b29aeaaa956b9e504d 17144: M 10 Oct 2021 11:41:09. 395 # Connection with Master Lost. 17144:M 10 Oct 2021 11:41:09.395 * Caching the disconnected Master State. 17144:M 10 Oct 2021 11:41:09.395 * Discarding Previously cached Master state. 17144:M 10 Oct 2021 11:41:09.395 * Master MODE enabled (user request from 'ID =8 addr=127.0.0.1:60790 fd=7 Name = age=9909 IDLE =0 FLAGS =N db=0 sub=0 psub=0 multi=-1 qBUf =36 qbuf free=32732 obl=0 Oll =0 omem=0 events=r CMD =replicaof') 17228:M 10 Oct 2021 11:41:09.395 # Connection with replica 127.0.0.1:6380 lost.okCopy the code

Next, set replicaof to replicaof 127.0.0.1 6379.

# Master-Replica replication. Use replicaof to make a Redis instance a copy of # another Redis server. A few things to understand ASAP about Redis replication. # # +------------------+ +---------------+ # | Master | ---> | Replica | # | (receive writes) | | (exact copy) | # +------------------+ +---------------+ # # 1) Redis replication is asynchronous, but you can configure a master to # stop accepting writes if it appears to be not connected with at least # a given number of replicas. # 2) Redis replicas are able to perform a partial resynchronization with the # master if the replication link is lost for a relatively small amount of # time. You may want to configure the replication backlog size  (see the next # sections of this file) with a sensible value depending on your needs. # 3) Replication is automatic and  does not need user intervention. After a # network partition replicas automatically try to reconnect to masters # and # # replicaof < masterIP > <masterport> replicaof 127.0.0.1:6380Copy the code

Then restart the redis node with replicaof port 6380. After the node is restarted, it will output some of the same logs as when replicaof is executed:

23118:S 10 Oct 2021 11:51:59.727 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer. 23118:S 10 Oct 2021 11:51:59.727 * Ready To accept connections 23118:S 10 Oct 2021 11:51:59.727 * Connecting to MASTER 127.0.0.1:6379 23118:S 10 Oct 2021 11:51:59.727 * MASTER <-> REPLICA Sync started 23118:S 10 Oct 2021 11:51:59.727 * Non blocking connect for sync fired The Event. 23118:S 10 Oct 2021 11:51:59.727 * Master to PING, Replication can continue... 23118:S 10 Oct 2021 11:51:59.727 * Trying a partial resynchronization (request 0 c18c8ae51afbaf9aac438b29aeaaa956b9e504d: 18545). 17228: M 10 Oct 2021 11:51:59. 728 * up 127.0.0.1:6380 asks for Synchronization 17228:M 10 Oct 2021 11:51:59.728 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '0c18c8ae51afbaf9aac438b29aeaaa956b9e504d', my replication IDs are 'adc7e6a0c9933e21844beefb24ccc7fd5f901e32' and '0000000000000000000000000000000000000000') 17228:M 10 Oct 2021 11:51:59.728 * Starting BGSAVE for SYNC with target: Disk 17228:M 10 Oct 2021 11:51:59.728 * Background saving started by PID 23119 23118:S 10 Oct 2021 11:51:59.728 * Full resync from master: Adc7e6a0c9933e21844beefb24ccc7fd5f901e32:18491, 23118:10 Oct 2021 S 11:51:59. 728 * Discarding previously cached master State.23119 :C 10 Oct 2021 11:51:59.731 * DB saved on disk 17228:M 10 Oct 2021 11:51:59.745 * Background saving Terminated with success 17228:M 10 Oct 2021 11:51:59.746 * Synchronization with Replica 127.0.0.1:6380 Succeeded 23118:S 10 Oct 2021 11:51:59.746 * MASTER <-> REPLICA Sync: Receiving 234 bytes from master 23118:S 10 Oct 2021 11:51:59.746 * master <-> REPLICA sync: Flushing old data 23118:S 10 Oct 2021 11:51:59.746 * MASTER <-> REPLICA sync: Loading DB in memory 23118:S 10 Oct 2021 11:51:59.747 * MASTER <-> REPLICA sync: Finished with successCopy the code

Logs print the general synchronization process. One thing we can see from this is that partial resynchronization is first attempted, and if partial resynchronization cannot be satisfied, full resynchronization is performed. Master/slave replication can be implemented in either of the above two ways. The slave node cannot be written and becomes a read-only node. For example, we set the redis node on port 6380:

127.0.0.1:6380> set msg "hello world"
(error) READONLY You can't write against a read only replica.
Copy the code

An error is returned telling you that the slave node is currently read-only and cannot be written.

The detailed steps to replicate the implementation in the above way are as follows (Redis 2.8 or above) :

Let’s talk briefly about authentication, then the last two steps will be covered in more detail later, and the other steps can be explored if you’re interested.

Authentication (AUTH command) – master/slave password authentication

After the slave sends the PING command to the master, the slave normally receives the PONG reply from the master, and the next step is to decide whether to authenticate. Authentication depends mainly on the following options for setting passwords for master and slave servers.

  1. The master set the requirepass option for the password (which is also the client login password) and set the password to 123456 as shown in the following figure

  1. Slave set masterauth option for authentication (set to master password)

By setting the related password options for the master and slave, the master and slave replication will automatically verify the password and pass.

If the masterAuth option is set on the slave, authentication is performed and the slave sends the master an AUTH command with the masterAuth option value.

The following is a flowchart of the situation that slave may encounter in the password authentication phase. If you are interested, you can perform the mode locally.

This simply simulates the case where the master sets the password and the slave does not set the password. We will see the following retry log:

Partial resynchronization not possible (no cached master) Unexpected reply to PSYNC from master: -NOAUTH Authentication required. Retrying with SYNC... MASTER aborted replication with an error: NOAUTH Authentication required. Connecting to MASTER 127.0.0.1:6379 MASTER <-> REPLICA Sync started Non blocking connect  for SYNC fired the event. Master replied to PING, replication can continue... (Non critical) Master does not understand REPLCONF listening-port: -NOAUTH Authentication required. (Non critical) Master does not understand REPLCONF capa: -NOAUTH Authentication required. Partial resynchronization not possible (no cached master) Unexpected reply to PSYNC from master: -NOAUTH Authentication required. Retrying with SYNC...Copy the code

Copying model

The replication from the secondary server to the primary server can be classified into full resynchronization and partial resynchronization (PSYNC) modes. Before we do that, let’s look at the main flow of master/slave replication (that is, synchronization and command propagation in the detailed steps of replication implementation).

Primary and secondary replication processes

Redis contains master and slave nodes. The master node provides read and write services. The slave node backs up data of the master and does not provide write service externally. The main process is as follows:

For master, the general steps are as follows:

1. Salve initiates the PSYNC command to the master. When the slave starts, the master passively adds the new slave to the active/standby replication cluster.

2. After receiving SYNC, the master enables the BGSAVE operation

3. After the BGSAVE is complete, the master sends the SNAPSHOT information RDB to the slave

4. During dispatch, new write commands received by the master are stored to the backlog queue in addition to normal responses.

5. After the snapshot information is sent, continue sending the backlog queue information.

6. After backlog is sent, subsequent write operations are sent to the slave to maintain real-time asynchronous replication.

On the slave side of the figure above, the processing logic is as follows:

1. After sending PSYNC, continue to provide external services (using data of the old version)

2. The slave starts to receive snapshot information from the master. At this time, the slave clears the existing data and writes the master snapshot to its own memory (clearing old data and writing the master snapshot to the memory does not provide services when hungry).

3. Take the Backlog content and execute it, providing read service requests externally

4. Continue to receive and execute copies of subsequent commands from the master to keep the data consistent with that of the master

If multiple slave nodes concurrently send SYNC to the master in an attempt to establish a master/slave relationship, the second slave will receive the same snapshot and backlog as the first slave as long as the SYNC of the second slave occurs before the master completes BGSAVE. Otherwise, the second SLAVE’s PSYNC triggers a second BGSAVE for the master.

The SYNC command (used before Redis2.8) performs full resynchronization. The PSYNC command (used after Redis2.8) has two modes: full resynchronization and partial resynchronization.

Full resynchronization

The adaptation scenario is as follows,

Initial replication scenario:

  • The slave server did not replicate other servers
  • The slave server last replicated the master server and the current master server are not different

Disconnection repeat system scenario:

  • Disconnect from the server with the primary server due to network reasons, then automatically reconnect even on the primary server, the primary server according to the offset offset from the server (from the server and the main server data synchronization offset) judgment replication backlog after offset offset in the buffer data if no longer exists, A full resynchronization is performed

The full resynchronization process is basically the same as the primary/secondary replication process described above.

Partial resynchronization

The application scenarios are as follows:

Disconnection repeat system scenario:

  • If the data after offset (offset from the server) is still in the replication backlog buffer, the master server will partially resynchronize the data to the slave server.

Prior to Redis2.8, there was no partial resynchronization mode in Redis. In cases where replication was required, full resynchronization was used, with the SYNC command sent from the secondary server to the primary server. Each time the slave synchronizes data with the master through SYNC, the master dumps the full data and sends it. When a slave that has been synchronized with the master and has been connected for a long time is disconnected from the master for a short time and then reconnected, the difference between the master and the slave is very small. It is inefficient for the master to send the full data again, resulting in a large amount of invalid overhead. This is an area that needs to be improved, and the best way to solve this situation is for the master to synchronize only a small amount of data during the disconnection.

Redis started with version 2.8 to address inefficiencies in handling broken line duplicates prior to Redis2.8. You can use the Partial Sync (PSYNC) command instead of the Sync command to perform replication resynchronization. This enables master-slave to implement the synchronization protocol based on breakpoint resumption. During the disconnection, the master client commands are saved in the cache. After the slave is reconnected, the master is informed of the latest offset at the disconnection time. The master sends data larger than offset to the slave, reducing data transmission overhead.

Let’s take a look at the implementation details of partial resynchronization, and first introduce the three components of partial resynchronization:

  • Replication offset of the primary server and replication offset of the secondary server

    The master server and the slave server maintain a replication offset respectively. Each time the master server propagates N bytes of data to the slave server, the replication offset will be + N. When receiving N bytes of data propagated by the master server from the server, it will also offset itself + N. In normal cases, the offsets of the master and slave servers are the same. If they are different, they are not in the same state.

    For example, if the slave node is disconnected and then reconnected to the master node, how does the data lost during disconnection be partially resynchronized to the slave node? We know the offset from the node. We can pass the offset from the node to the master node. The master node knows how to get the lost data from the offset. This is where the replication backlog buffer of the master server comes in.

  • Replication Backlog for primary server

The replication backlog buffer is a fixed-length first-in, first-out queue maintained by the primary server, with a default size of 1MB.

# Set the replication backlog size. The backlog is a buffer that accumulates
# replica data when replicas are disconnected for some time, so that when a replica
# wants to reconnect again, often a full resync is not needed, but a partial
# resync is enough, just passing the portion of data the replica missed while
# disconnected.
#
# The bigger the replication backlog, the longer the time the replica can be
# disconnected and later be able to perform a partial resynchronization.
#
# The backlog is only allocated once there is at least a replica connected.
#
# repl-backlog-size 1mb
Copy the code

When a command is propagated by the master server, it is not only propagated to all slave servers, but also queued into the replication backlog buffer. The replication backlog buffer holds the most recently propagated write command and records the replication offset for each byte.

Then we can better understand the PSYNC from the master server receives the command to get to replicate the offset, coming from the server to replicate the backlog buffer to check copy behind the offset command exists, exists to send + CONTINUE command from the server, to tell with some heavy synchronous mode to sync from the server. The master server then sends the data behind the replication backlog buffer offset from the slave server to the slave server, which receives it and executes it.

It is important to correctly estimate and set the replication backlog buffer size. To be safe, it is common to set the replication backlog buffer size to:

2 * second * write_size_per_second

Second indicates the average time required by the secondary server to disconnect and reconnect to the primary server. Write_size_per_second Indicates the number of write command data generated by the primary server per second.

  • Server run ID (RUN ID)

How can implementing replication resynchronization depend on the running ID of the server? Let’s talk about run ids first.

Each Redis server has its own run ID, which is automatically generated when the server starts and is generated by 40 random hexadecimal characters. During the initial replication between the secondary server and the primary server, the primary server sends its running ID to the secondary server, and the secondary server saves the running ID of the primary node.

After the secondary server disconnects and reconnects for synchronization, the previously saved running ID of the primary server is passed to the current primary server. The master server checks its run ID against its own, and if it is the same as its own, determines whether other conditions are met. If so, partial resynchronization is performed. If it is different from itself, a full resynchronization is performed. This is why and the server run ID is the reason.

Implementation of the PSYNC command

The heartbeat detection

Let’s look at the command propagation phase, which is when the master receives a copy of the command from the server after synchronization. By default, the slave server sends commands to the master server once per second:

REPLCONF ACK <replication_offset>

Replication_offset indicates the current replication offset of the secondary server.

So what’s the role of heartbeat detection?

  • Check the network connection status of the primary and secondary servers

    If the master server does not receive the REPLCONF ACK command from the slave server for more than a second, the master server will know that there is a problem with the connection between the master and slave servers by sending the INFO replication command to the master server, Lag, which lists the list from the server, records the time (in seconds) between the last heartbeat detection. Generally, lag fluctuates between 0s and 1s. If the number exceeds 1s, the connection is faulty.

  • Assist to implement the Min-Slaves option

    Prevents the primary server from executing write commands in an unsafe state. Reject write commands with the following two options:

#
# The N replicas need to be in "online" state.
#
# The lag in seconds, that must be <= the specified value, is calculated from
# the last ping received from the replica, that is usually sent every second.
#
# This option does not GUARANTEE that N replicas will accept the write, but
# will limit the window of exposure for lost writes in case not enough replicas
# are available, to the specified number of seconds.
#
# For example to require at least 3 replicas with a lag <= 10 seconds use:
#
# min-replicas-to-write 3
# min-replicas-max-lag 10
#
# Setting one or the other to 0 disables the feature.
#
# By default min-replicas-to-write is set to 0 (feature disabled) and
# min-replicas-max-lag is set to 10.
Copy the code

The min-replicas-to-write and min-replicas-max-lag options are configured. The default is configuration:

min-replicas-to-write 0
min-replicas-max-lag 10
Copy the code

If min-replicas-to-write is set to 0 or min-replicas-max-lag is set to 0, the function is disabled.

Let’s look at an example configuration:

min-replicas-to-write 3
min-replicas-max-lag 10
Copy the code

The preceding configuration indicates that when the number of secondary servers is less than three or the lag value of all three servers is greater than or equal to 10s, the primary server rejects the write command.

  • Detection command loss

If the write command sent from the primary node to the secondary node is lost due to a network fault, the heartbeat detection sends the REPLCONF ACK <replication_offset> command. The primary server detects data loss based on the replication offset of the secondary node and performs partial resynchronization to replace the missing data. This is to maintain consistency between master and slave data.

conclusion

1. Before Redis2.8, Redis2.8 or later is recommended if Redis2.8 resolves this problem.

2. After synchronization, the secondary server executes the command copy sent from the primary server to ensure data consistency between the primary and secondary servers. The secondary server sends commands to the primary server for heartbeat detection to check command loss, network connection status, and current health status of the primary and secondary servers.

Reference books: Redis Design and Implementation, Deep Into Distributed Caching