Did you learn the persistence mechanism in Redis? Learned about AOF and RDB. If Redis went down, they recovered data by playing back logs and RDB files, respectively, to improve reliability. Let’s think about the problem today. If we have only one Redis instance deployed, if that instance goes down, it will not be able to provide data access requests during the recovery period, thus disrupting service.

Redis has avoided this by adding copies of data to multiple instances at the same time. Even if one instance fails, other instances can provide services during the recovery period without affecting the use of services. How does multiple instances ensure data consistency? Can data reads and writes be sent to all instances? In fact, Redis provides a master-slave mode to ensure that data copies are consistent. The primary and secondary libraries are separated from each other. For read operations, both the master and slave libraries can be serviced. But for writes, they are first performed to the primary, and then the primary synchronizes the writes to the secondary.

So why the read-write separation? If both the primary and secondary repositories can accept client writes, the immediate problem is that if the client makes two changes to the same key and each change is assigned to a different instance, the copies on the two instances will be inconsistent. In the master-slave mode, once read/write separation is adopted, all data changes are made only on the master. After the primary database has the latest data, it synchronizes the data to the secondary database. In this way, the data of the primary and secondary databases is consistent.

Let’s look at how master-slave synchronization is done. Let’s first look at how the first synchronization between the master and slave libraries works. When multiple Redis instances are started, the relationship between them can be formed by replicaof command, and then the first synchronization of data is completed in three stages.

Replicaof 127.0.0.1 6379Copy the code

The first phase is the process of establishing a connection and negotiating synchronization between the master and slave databases to prepare for full replication. In this step, the secondary establishes a connection with the primary and tells the primary that data synchronization is about to take place. Once the primary replies, the synchronization between the primary and the secondary starts.

Specifically, the secondary sends the psync command to the primary to synchronize data, and the primary starts replication based on this command parameter. The psync command contains runID of the primary database and offset of the replication progress.

  • RunID: A random ID is generated when each Redis instance is started to uniquely identify the instance. When the primary and secondary repositories are replicated for the first time, because the runID of the primary repository is not known, the replication is?
  • Offset: Set it to -1, indicating the first replication.

After receiving the psync command, the primary library uses the FULLRESYNC command to respond with two parameters: runID of the primary library and offset of the current replication progress of the primary library and returns it to the secondary library. When the response is received from the library, these two parameters are recorded. One thing to note here is that the FULLRESYNC response indicates that the first replication is a full replication, that is, the primary will copy all the current data to the secondary.

The second phase is when the master synchronizes all data to the slave. After receiving the data from the slave library, the data is loaded locally. This process relies on the RDB file generated by the memory snapshot.

Specifically, the master executes the bgsave command to generate an RDB file, which is then sent to the slave. After receiving the RDB file from the library, the current database is cleared and the RDB file is loaded. This is because the slave may have stored other data before synchronizing with the master by executing replicaof. To avoid the impact of previous data, the slave needs to clean up the current database. While the master synchronizes data to the slave, the master is not blocked and can still receive requests. Otherwise, Redis’ service is interrupted. However, the writes in these requests are not documented in the RDB file just generated. To ensure data consistency between the primary and secondary libraries, the primary library uses a dedicated replication buffer to record all write operations received after the RDB file is generated.

Finally, in the third phase, the master sends the commands received during the execution of the second phase to the slave. After the primary repository sends the RDB file, the secondary repository sends the actions recorded in the replication buffer to the secondary repository. The secondary repository then executes the actions. In this way, the master and slave libraries are synchronized.

At this point, we have learned how to synchronize data between the master and slave libraries through full replication. Once the master and slave repositories have completed full replication, they maintain a network connection through which the master synchronizes incoming commands to the slave, a process known as persistent connection-based command propagation that avoids the overhead of frequent connection establishment.

However, in the process of command propagation between master and slave, if there is a network disconnect, the command propagation between master and slave will not be possible, and the slave will not be consistent with the master. Next, let’s talk about what to do when you lose your Internet connection.

After the network connection is disconnected, the Redis primary and secondary libraries will be synchronized in incremental replication mode. How do you ensure synchronization between the master and slave libraries in incremental replication? The secret lies in the repl_backlog_buffer ring buffer. In addition to writing write commands received by the master to the replication buffer, the master also writes these action commands to the repl_backlog_buffer buffer. The REPL_backlog_buffer is a ring buffer in which the primary keeps track of where it writes and the secondary keeps track of where it reads. In the beginning, the write positions of the master and slave libraries are together, which is their starting position. As the primary library receives new operations, its write position in the buffer gradually deviates from its starting position. We usually measure this offset in terms of the offset, which is master_REPL_offset for the master library. The more new writes the primary library receives, the greater this value will be. Similarly, after the slave replicates the write command, its read position in the buffer gradually offsets from where it started. The slave_REPL_offset that the slave has copied is also increasing. Normally, these two offsets are roughly equal.

After the connection is recovered, the slave will first send the psync command to the master and send its current slave_REPL_offset to the master. The master will determine the difference between its own master_REPL_offset and slave_REPL_offset. During the disconnection phase, the primary database may receive a new write command. So, in general, master_REPL_offset will be greater than slave_REPL_offset. At this point, the master only synchronizes the command operations between master_REPL_offset and slave_REPL_offset to the slave. As shown in the figure above, there are b, C, and D differences between the primary and the slave. In incremental replication, the primary simply synchronizes them to the slave. Let’s review incremental copying with a graph.

It is important to note that since the REPL_backlog_buffer is a ring buffer, the primary library will continue to write when the buffer is full, overwriting the previous write. If the read speed of the slave library is slow, the unread operations of the slave library may be overwritten by the new write operations of the master library. As a result, data inconsistency occurs between the master and slave libraries, resulting in full replication.

Therefore, we want to avoid this by adjusting the REPL_backlog_size parameter to set the size of the cache. If it is configured too small, the slave may not be able to catch up with the master during the incremental replication phase, causing the slave to perform full replication again. Therefore, by increasing this parameter, you can reduce the risk that the slave will replicate in full if the network is disconnected.

The master-slave copy of Redis is shared here. More hardcore knowledge, please pay attention to it.