preface

In distributed environment, data Replica and Replication, as effective means to improve system availability and read/write performance, are widely used in various distributed systems, and Redis is no exception.

Although master slave replication is not used directly as a high availability solution for Redis, it is used by both sentinels and clusters, so it is important to learn the principle of master slave replication first.

The body of the

Implementation principle of master/slave replication

In Redis 6.0, the complete process of master/slave replication is as follows:

1. Enable primary/secondary replication

There are usually three ways:

1) Run the slaveof command on the slave

2) Add slaveof to the slave configuration file

3) Use the startup command –slaveof

Note: After Redis 5.0, Slaveof commands and configurations have been replaced with Replicaof, e.g. replicaof. For compatibility with older versions, Slaveof is still supported by configuration, but not by command.

2. Establish a socket connection

The slave sends a socket connection to the master based on the specified IP address and port. After the master accepts the socket connection, the master creates the corresponding client state for the socket. Then the connection is established.

3. Run the PING command

The slave sends a PING command to the master to check whether the read/write status of the socket is normal and whether the master can process command requests properly.

If the slave receives a reply from “PONG”, it indicates that the network connection between the master and slave is normal and that the master can handle command requests properly.

If no response is received or no response is received, it indicates that the network connection between the master and slave is poor or the master cannot process the command requests from the slave. Then the slave enters the error process: The slave disconnects the current connection and tries again.

4. Authentication

If neither master nor slave has a password, no authentication is required.

If the master and slave have the same password, the authentication succeeds.

Otherwise, an error will be returned if the master and slave have different passwords, or if the master and slave have no passwords.

All error conditions cause the slave to enter the error process: The slave disconnects the current connection and tries again later.

5. Send port information

After the authentication succeeds, the slave sends its listening port number to the master. The master receives the listening port number and records it in the Slave_listening_port property of the slave client status.

6. Send the IP address

If slave_announce_ip is configured, the slave sends the IP address specified by slave_announce_IP to the master. After receiving the address, the master records the slave_ip property of the client status corresponding to the slave.

This configuration is used to prevent other servers from accessing the Intranet when the server returns an IP address. You can directly specify a public IP address through this configuration.

7. Send CAPA

CAPA stands for capabilities.

At this stage, the slave sends a capA to inform the master of its (synchronous) replication capability. After receiving the capA, the master records the Slave_CAPa property of the slave client status.

CAPA has two values in the latest Version of Redis 6.0: EOF and pSYNc2.

Eof indicates that the slave can directly receive RDB data streams sent from sockets, that is, diskless_load.

Psync2 means that slave supports partial resynchronization of version V2 introduced in Redis 4.0, which is described in more detail below.

8. Data synchronization

The slave sends the PSYNC command to the master. After receiving this command, the master determines whether to perform partial or complete resynchronization and then synchronizes data based on the policy.

1) If the slave performs the replication for the first time, send PSYNC to the master. -1: the master returns +FULLRESYNC to perform complete resynchronization

2) If the replication is not performed for the first time, PSYNC replID offset is sent to the master, where replID is the replication ID of the master and offset is the current replication offset of the slave. The master determines which synchronization operation should be performed based on replID and offset.

For full resynchronization, +FULLRESYNC is returned; In the case of partial resynchronization, +CONTINUE is returned, and the slave simply waits for the master to send its missing data.

Command propagation

When the synchronization is complete, the master sends the write commands to the slave, and the slave receives and executes the write commands from the master. This ensures that the master and slave are consistent.

During command propagation, the slave sends the REPLCONF ACK command to the master once every second. Reploff is the current replication offset of the slave.

Sending the REPLCONF ACK command has three effects on the primary and secondary servers:

1) Check the network connection status of the master and slave.

2) Report its own replication offset and detect command loss. The master compares the replication offset and sends unsynchronized data to the slave if the replication offset is smaller than its own.

3) Auxiliary implementation of Min-Slaves configuration for preventing master from executing write commands in unsafe conditions.

For example, the following configuration indicates that: If the number of slaves whose latency is less than 10 seconds is less than 3, the write command is rejected. In this case, the delay time is compared with the current time and the latest ACK time sent by the slave.

min-slaves-to-write 3min-slaves-max-lag 10
Copy the code

Take partial resynchronization as an example. The core steps of master/slave replication are as follows:

The source code is in Replication. C. The core methods are replicationSetMaster, connectWithMaster, and syncWithMaster

Old version: SYNC

Data synchronization before Redis 2.8 is completed by using the SYNC command. The complete process is as follows:

1. The slave sends the SYNC command to the master.

The master receives SYNC and executes the BGSAVE command, fork the RDB file, and use a buffer to record all write commands from now on.

Redis uses copy-on-write (COW), which is briefly introduced here.

A “stupid” way to fork a child is to make a copy of the entire address space of the parent, but this is time consuming and is generally not done.

After fork, the parent process shares the address space of the parent process. Only when the parent process wants to write, the parent process makes a copy of the content to be modified and then writes it. This is where copy-on-write comes from.

Back to this article, when the main process forks out its child process, due to COW, it can be considered that at the moment of fork, the snapshot has been generated, but the RDB file has not yet been written.

The RDB file is the data at fork. Between fork and the time the master sends the RDB file to the slave, the master continues to execute write commands. How does the slave get write commands in between?

A buffer is used to record all write commands executed from now on. This buffer is used to record all write commands after fork.

Later, when the master sends the RDB file to the slave, the master will continue to send write commands from the buffer to the slave, as in step 4 below, to ensure that the slave data is complete.

3. When the BGSAVE command is executed, the master sends the RDB file to the slave. The SLAVE receives the RDB file, loads it into memory, and updates the database state to the database state when the master executes BGSAVE.

There are two ways to send RDB files: 1) Socket: The master sends RDB files through the socket directly to the slave. 2) Disk: The master persists the RDB file to the disk and then sends it to the slave.

The default mode is Disk. You can configure the socket mode as follows.

repl-diskless-sync yes
Copy the code

Diskless-sync-delay: This parameter is used to wait a certain period of time before starting replication. In this case, multiple slave nodes can be connected again.

The Socket mode applies to the environment where the disk read/write speed is slow but the network bandwidth is high.

In addition, the main process checks whether the child BGSAVE has finished executing by timing the event.

4. The master sends all write commands recorded in the buffer to the slave. The slave executes these commands to update the database state to the current state of the master database.

Problems of SYNC: Complete resynchronization is required every time the slave is disconnected and reconnected, which is inefficient.

New version: SYNC

In order to solve the problem of using full resynchronization every time the slave is disconnected and reconnected, Redis introduced PSYNC in version 2.8, which includes full resynchronization and partial resynchronization.

1. Full resynchronization: basically the same as the SYNC command.

2. Partial resynchronization: The slave only needs to receive and synchronize the write commands lost during the disconnection, but does not need to perform complete resynchronization.

To achieve partial resynchronization, Redis introduces the concepts of replication offset, replication backlog buffer, and run ID.

Copy offset

Both the master and slave parties maintain a replication offset. Each time the master propagates N bytes to the slave, its own replication offset increases by N. Similarly, when the slave receives N bytes, the replication offset increases by N. You can know the synchronization status by comparing the replication offsets between the master and slave.

Replication Backlog Buffer

The replication backlog buffer is a fixed-length FIFO queue maintained by the master, with a default size of 1MB.

When the master propagates commands, it not only sends write commands to the slave but also writes them to the replication backlog. Therefore, the replication backlog buffer of the master stores some of the recently propagated write commands.

When the slave reconnects to the master, it sends its replication offset to the master through the PSYNC command. The master checks its replication backlog buffer. If it finds that the unsynchronized commands are still in its own replication backlog buffer, it can use the saved commands to perform partial synchronization. On the other hand, if the line is disconnected for too long, the command is no longer in the replication buffer, so there is no alternative but to perform full synchronization.

Runid (runid)

Each Redis Server will have its own run ID, which consists of 40 random hexadecimal characters. When the slave replicates the master for the first time, the master sends its running ID to the slave for saving. In this way, the slave sends the running ID to the reconnected master. The master accepts this ID and compares it to its own run ID to determine if it is the same master.

After introducing these three concepts, the data synchronization process is as follows:

1) The slave sends the runid and offset being replicated to the master using the PSYNC runid offset command.

2) The master determines that the rUNID is the same as its own runiD, and offset is still copying the backlog buffer, then partial resynchronization is performed: By copying the backlog buffer, the missing commands are sent to the slave, and the slave executes the commands to update the database state to the master’s state.

3) Otherwise, if the master determines that the RuniD is not the same, or the offset is no longer replicated in the backlog buffer, the full resynchronization is performed.

The complete process of PSYNC is shown below:

Problems with PSYNC

From the above process, we can see that the PSYNC execution part resynchronization needs to meet two conditions: 1) master runid is unchanged; 2) The replication offset is in the master replication backlog. If these two conditions are not met, a full resynchronization is still required, as in the following scenario.

1. After the slave restarts, the cached master runid and offset will be lost, and the slave needs to perform complete resynchronization.

2. A failover occurs in Redis. After the failover, the master RUNID changes and the slave needs to perform complete resynchronization.

Slave maintenance restart and master failover are common scenarios in Redis operation and maintenance. Therefore, the probability of these two problems in PSYNC is very high.

The source code is in Replication.c, and the core method is: SyncCommand, readSyncBulkPayload, replicationFeedSlaves, backgroundSaveDoneHandler, slaveTryPartialResynchronization, etc

PSYNC2

In order to solve the problem of PSYNC restart and failover causing full resynchronization on slave, Redis has optimized PSYNC in version 4.0, which we call PSYNC2.

PSYNC2 has made the following two major changes:

1. Introduce two sets of replID and offset to replace the original runid and offset

The first group: REPLID and master_REPL_offset

For master, represent its own replication ID and replication offset;

For slave, it is the replication ID and replication offset of the master that it is synchronizing.

The two fields in this group can be considered to correspond to the original runid and offset.

Group 2: REPLid2 and second_repl_offset

For the master and slave, both indicate the replication ID and replication offset of the previous master. It is used to support partial resynchronization during failover.

It is important to note that the RUNID did not cease to exist after the introduction of REPLID. Prior to 4.0, Redis used rUNID as the identity of master/slave replication, and since 4.0 introduced replID as the identity of master/slave replication. However, rUNId in Redis is not only used as the identity of master/slave replication, rUNId still has other functions, such as: Used as a unique identifier for the Redis server.

2. Slave also enables replication backlogs

Replication of the backlog buffer is enabled on the slave. After failover, when a slave is upgraded to master, the slave still supports partial resynchronization by replicating the backlog buffer.

If the slave does not enable the replication backlog, the replication backlog will be empty after the slave is upgraded to master, and partial resynchronization will not be supported.

Next, let’s take a look at how Redis optimizes two problems with PSYNC.

Optimization Scenario 1: Complete synchronization occurs after the slave restarts

The root cause of this problem is that the replication ID (running ID) and replication offset are lost after the slave restarts. The solution is simply to save these two variables before shutting down the server.

Here’s what Redis does: The slave will call the rdbSaveInfoAuxFields function to save the current replication ID (REPLID) and replication offset (master_REPL_offset) as secondary fields to the RDB file before shutting down. When the slave restarts later, You can read the replication ID and the replication offset from the RDB file, and then use these two variables for partial resynchronization.

Optimized scenario 2: A master failover leads to complete resynchronization

The root cause of this problem is that a new master is created after failover, and the replication ID (running ID) of the new master is changed, causing partial resynchronization to fail.

Under normal synchronization, the new master’s data is theoretically identical to the old master’s, including copying the backlogs.

Therefore, it is theoretically possible for the slave to partially resynchronize, but it is not possible now simply because the replication ID has changed. So, our goal is to find a way to concatenate the new master with the other slaves.

The common thread between the new master and other unpromoted slaves is that the pre-failover master is the same, so it is easy to think of using the pre-failover master to concatenate the new master with the remaining slaves.

Here’s what Redis does: When a slave node is promoted from slave to master, it moves the first set of replication ids and offsets (that is, the old master’s) saved by the slave node to the second set of replication ids and offsets, and generates a new replication ID from the first set of replication ids, that is, its own replication ID.

Replid stores its own replication ID and replid2 stores the replication ID of the old master.

This way, the new master can use Replid2 to determine whether the slave previously copied data from the same master as it did, and if so, try partial resynchronization.

The complete flow of PSYNC2 is shown below, which is similar to PSYNC, except for the purple box.

Related source code basic with PSYNC

Evolution of master slave replication

From Redis 2.* to the present, developers have gradually optimized the master-slave replication process. Here is the evolution:

1. Before version 2.8, Redis replication adopts SYNC command, which adopts complete resynchronization whether it is the first replication or the replication after disconnection, resulting in high cost.

2. Replication between 2.8 and 4.0 adopts PSYNC command, which mainly optimizes partial resynchronization through runid and offset information when Redis is disconnected and reconnected.

3. PSYNC has been optimized since version 4.0, often referred to as PSYNC2, for full resynchronization of PSYNC during slave restarts and failover.

The last

When your talent can not support your ambition, you should calm down to learn, I hope you can gain here.

If you think this article is good and helpful, please let me know by ** * and support me to write better articles.

Recommended reading

Have you heard of the Pegasus program that pays programmers $1 million a year?

How much does a programmer get for a year of development?

Knowledge system and growth route of programmer 50W annual salary.

What you don’t know about violent recursive algorithms

Open up hongmeng, who do the system, talk about Huawei microkernel

Three things to watch ❤️

If you find this article helpful, I’d like to invite you to do three small favors for me:

Like, forward, have your “like and comment”, is the motivation of my creation.

Follow the public account “Java Doudi” to share original knowledge from time to time.

Also look forward to the follow-up article ing🚀