Author: Wang Yu, Engineer of Baidu Infrastructure Department

The master-slave replication of Redis has undergone many evolvations. This paper will start from the most basic principles and implementation, and gradually present the evolution history of the master-slave replication of Redis. You’ll learn how master slave replication works, what problems each version of Redis fixes, and finally get a complete picture of how master slave replication works in Redis 7.0.

What is master-slave replication?

In the database context, replication means copying data from one database to another. Master/slave replication is to divide a database into the master node and the slave node. The master node continuously copies data to the slave node to ensure that the same data exists on the master and slave nodes.

With master-slave replication, there can be multiple copies of data, which brings multiple benefits:

First, improve the request processing capacity of the database system. The read traffic supported by a single node is limited. Multiple nodes are deployed to form a master-slave relationship. Data consistency between the primary and secondary nodes is maintained through master-slave replication so that the primary and secondary nodes can provide services together.

Second, improve the usability of the entire system. Because the slave node has a copy of the master node’s data, when the master node goes down, one of the slave nodes can be immediately promoted to the master node to continue providing services.

Redis master slave replication principle

To implement master-slave replication, the intuitive idea is to generate a snapshot of the data on the master node and send it to the slave node as a baseline. Then, the incremental data after the snapshot time is sent to the slave node to ensure the consistency of the data on the master and slave nodes. In general, master/slave replication consists of full data synchronization and incremental data synchronization.

In the master-slave replication implementation of Redis, there are two similar phases: full data synchronization and command propagation.

  • Full data synchronization: The primary node generates a snapshot of full data, that is, an RDB file, and sends the snapshot to the secondary node. The newly received write commands are recorded after the snapshot is created. After the snapshot is sent, the system sends the accumulated write commands to the secondary node, and the secondary node executes these write commands. At this point, the baseline has been established and the data between the primary and secondary nodes is generally consistent.
  • Command propagation: After the full data synchronization is complete, the primary node continuously sends the executed write commands to the secondary node. The secondary node executes these commands to ensure the same data changes on the primary node and secondary node, ensuring data consistency between the primary node and secondary node.

The following figure shows the entire process of Redis master-slave replication:

  1. After the primary/secondary relationship is established, the secondary node sends a SYNC command to the primary node to request primary/secondary synchronization.
  2. After receiving the SYNC command, the master node forks to create a child process that stores all data in a Redis Database (RDB) file in a specific code. This creates a snapshot of the Database.
  3. The master node sends this snapshot to the slave node, which receives and loads the snapshot.
  4. The master node then sends the write commands backlogged during snapshot generation and snapshot sending to the slave node. The slave node receives the commands and executes them. When the commands are executed, the data in the slave node changes in the same way.
  5. Thereafter, a steady stream of newly executed write commands is synchronized from the master node to the slave node, which executes propagated commands. In this way, the master and slave data are consistent. It should be noted that command propagation has delay, so data consistency between the primary and secondary nodes cannot be guaranteed at any time.

The above is the basic principle of Redis master-slave replication, simple easy to understand, Redis will adopt the original plan, but there are some problems with this scheme, the fork take too long, when performing fork, blocking the main process need to copy a lot of memory page table, this is a time consuming operation, especially when large memory usage. Students in the group have done a test, when the memory occupies 10GB, fork takes more than 100 milliseconds. The main process blocked at fork for over 100 milliseconds, which is way too long for Redis. In addition, if there are many writes in the main library after fork, it will consume a lot of memory due to the copy-on-write mechanism and increase the response time. Intermittent disconnection of the network between the primary and secondary nodes triggers full synchronization If the network between the primary and secondary nodes is faulty and the connection is disconnected unexpectedly, the primary node cannot continue to transmit commands to the secondary node. After the network is restored and the slave node reconnects to the master node, the master node can no longer propagate the newly received commands because the slave node has missed some commands. At this point, the slave node needs to go back to the drawing board and perform the entire synchronization process again, which is costly. Network flash outages are a common occurrence, during which only a small amount of data may be written to the master node, but this small amount of data requires an expensive full synchronization from the slave node. This is very inefficient. How do you solve this problem? The next section on Redis partial resynchronization gives you the answer.

Redis is partially resynchronized

When the network is temporarily disconnected, slave nodes need to be resynchronized, which is wasteful and environmentally unfriendly. Why do slave nodes need to be resynchronized? Some commands are not synchronized to the slave node during the master-slave disconnection. If these commands are ignored and subsequent commands are propagated, data will be corrupted because the lost commands cannot be ignored. Why not save those commands? This way, when the slave node is reconnected, it can be replenished with commands from the disconnection period, eliminating the need for full resynchronization. Partial synchronization has been introduced since Redis 2.8. It maintains a replication backlog buffer in the master node, where commands are propagated to the slave node on the one hand and recorded on the other. It is not necessary to save all commands. Redis uses a circular buffer so that only the most recent commands can be kept.

The commands are saved, but after the slave node is reconnected, where does the master node start sending commands to the slave node? If all commands can be numbered, the slave node only needs to tell the master the number of the last command it received, and the master node knows from where to send the command. The Redis implementation numbers the bytes, which in Redis context is called a copy offset. With partial synchronization, the master/slave replication flow looks like this:

The Partial SYNC command is used for Partial SYNC instead of SYNC. PSYNC syntax is as follows:

`PSYNC <master id> <replication offset>`
Copy the code

The two arguments in the command are the number of the primary node and the replication offset. Each Redis node has a 40-byte number, and the number carried in the PSYNC command is the number of the primary node to be synchronized. The copy offset indicates where the slave node currently wants to start partial synchronization. If the primary node is replicated for the first time, the number of the primary node is not known and the offset of the replication is meaningless. In this case, use PSYNC? -1 to perform full synchronization. In addition, if the replication offset specified by the slave node is not within the scope of the replication backlog buffer on the master node, partial synchronization fails and full synchronization shifts. With partial synchronization, full synchronization can be avoided after network flash. However, since the master node can only keep some of the most recent commands, how much is saved depends on the size of the replication backlog buffer. If the secondary node is disconnected for too long, or if there are enough new write commands executed by the primary node during the disconnect, all the missed commands will not be saved to the replication backlog buffer. Enlarging the replication backlog avoids full synchronization as much as possible, but it also incurs additional memory consumption. Partial synchronization consumes part of memory to hold the most recently executed write commands, avoiding full synchronization after a flash, which is an intuitive and easy to imagine solution. That’s a good plan. Are there any other problems? Consider the following questions: What if the slave node is restarted? Partial synchronization depends on the number of the primary node and the replication offset. The secondary node obtains the number of the primary node during the initial synchronization and adjusts the replication offset during subsequent synchronization. This information is stored in the memory. After an unexpected restart of the slave node, a full synchronization is required even though RDB or AOF files exist locally. But you can actually load local data and perform partial synchronization. What if there is a master/slave switch? If the master node unexpectedly goes down, the peripheral monitoring component performs a master/slave switchover. In this case, the primary node corresponding to other secondary nodes is changed, and the primary node id recorded on the secondary node does not match the new primary node. In this case, full synchronization is performed. But in fact, the synchronization progress of all slave nodes before the master/slave switchover should be similar, and the newly promoted slave node should contain the most complete data. After the master switchover, all slave nodes should perform a full synchronization, which is really unreasonable.

How to solve the above problems, please continue to see.

Same-origin incremental synchronization after the slave node is restarted, the original master node number and the replication offset are lost. This results in the need for full synchronization after the restart. This is easy to do, save this information. After a master/slave switchover, the information on the master node changes, causing full synchronization on the slave node. This is also easy to solve. If you can confirm that data on the new master node is copied from the original master node, you can continue to replicate data from the new master node. Since Redis 4.0, PSYNC has been improved to provide a solution for same-origin incremental replication that addresses the two problems mentioned earlier. After the secondary node is restarted, it needs to be fully synchronized with the primary node. This is essentially because the secondary node loses the serial number information of the primary node. After Redis 4.0, the serial number information of the primary node is written into the RDB for persistent storage. After the primary node is switched, the secondary node needs to be fully synchronized with the new primary node because the new primary node does not recognize the id of the original primary node. The slave node sends PSYNC < original master node number > < replication offset > to the new master node, if the new master node can recognize < original master node number > and understand that its data is copied from that node. The new master node should know that it is the same as the slave node and should accept partial synchronization. How to identify it? Just ask the slave node to record the number of its previous master node before switching to the master node. After Redis 4.0, the new master node will record the previous master node and observe the result of Info Replication. You can see the master_replID and master_replid2 numbers. The former is the number of the current master node. The latter is the number of the previous master node:

127.0.0.1:6379> slaveof no one OK 127.0.0.1:6379> info replication # replication role:master... master_replid:b34aff08d983991b3feb4567a2ac0308984a892a master_replid2:a3f2428d31e096a99d87affa6cc787cceb6128a2 master_repl_offset:38599 second_repl_offset:38600 ... Repl_backlog_histlen :5180` ``` Redis currently retains two master node numbers, but it is possible to achieve a linked list, which records the past master node numbers, so that you can trace further. In this way, if a slave node is disconnected and performs multiple master/slave switchover, its data can still be identified as homologous after the slave node is reconnected. Redis does not do this because it is unnecessary, because even if the data is cognate, there is a limited amount of data stored in the replication backlog. After multiple master/slave switches, the commands stored in the replication backlog are no longer sufficient for partial synchronization. With same-origin incremental replication, after the primary node is switched, other secondary nodes can continue incremental synchronization based on the new primary node. At this point, master-slave replication doesn't seem to be a big problem. But the guys at Redis, they're always trying to figure out if they can optimize it. Below I describe some optimization strategies for master slave replication in Redis. Redis To perform full replication, a snapshot of the current database is generated by forking a child process that traverses all data and encodes it to an RDB file. After the RDB is generated, the file is read in the main process and sent to the slave node. Reading and writing RDB files on disk is expensive, and executing them in the main process will inevitably result in a longer Redis response time. Therefore, an optimal solution is to dump the data directly to the slave node without writing the data to the RDB first. Redis 6.0 implements this strategy of full synchronization without disk and load without disk. The diskless full synchronization avoids disk operation, but it also has disadvantages. In general, sending data directly using the network in a child process is slower than generating an RDB in a child process, which means that the child process needs to live for a relatively long time. The longer the child process exists, the greater the impact of copy-on-write, which in turn consumes more memory. In full replication, the slave node receives the RDB and saves it locally, and then loads the RDB. Similarly, data sent from the master node can be loaded directly from the slave node, avoiding storing it in a local RDB file and then loading it from disk. \ Share master-slave replication buffer \ From the master's point of view, the slave is a client. After the slave sends the PSYNC command, the master completes full synchronization with them and continuously synchronizes write commands to the slave. Redis has a send buffer on each client connection. After the master node executes the write command, it writes the command content to the send buffer of each connection. The send buffer stores commands to be propagated, which means that the contents of multiple send buffers are essentially the same. Moreover, a copy of these commands is saved in the replication backlog. This results in a lot of wasted memory, especially if there are many slave nodes. ! [6.png](https://p6-juejin.byteimg.com/tos-cn-i-k3u1fbpfcp/20dd147145c847be94e54c78c1500b17~tplv-k3u1fbpfcp-watermark.ima Ge?) in Redis 7.0, our team solved this problem by proposing and implementing a shared master-slave replication buffer. In this scheme, the sending buffer is shared with the replication backlog buffer, avoiding the duplication of data, which can effectively save memory. This article reviews and summarizes the evolution of master/slave replication in Redis and explains the problems solved by each evolution. Finally, some strategies for optimizing master slave replication in Redis are described. 1. From a macro point of view, master/slave replication of Redis is divided into two stages: full synchronization and command propagation. The primary node sends snapshots to the secondary node and then continuously transmits commands to the secondary node to ensure data consistency between the primary node and secondary node. 1. Before Redis 2.8, there was a problem that the master/slave replication needed to be fully synchronized after the flash. Redis 2.8 introduced the replication backlog buffer to solve this problem. 1. In Redis 4.0, the policy of same-origin incremental replication is proposed to solve the problem that the slave node needs full synchronization after the master/slave switchover. At this point, the master/slave replication of Redis is complete on the whole. 1. In Redis 6.0, to further optimize the performance of master/slave replication, diskless synchronization and load are proposed to avoid reading and writing disks during full synchronization and improve the speed of master/slave synchronization. 1. In Redis 7.0 RC1, the policy of sharing the master/slave replication buffer is adopted to reduce the memory overhead caused by the master/slave replication. Hopefully this article will help you review the principle of Redis master-slave replication and build a better impression of it.Copy the code