Any jerk who doesn’t want to play the guitar well is not a good programmer

Although the stand-alone Redis has good performance and a complete persistence mechanism, what if your business volume is really large and exceeds the maximum capacity of the stand-alone Redis? What if Redis dies without doing anything? With that in mind, we begin today’s topic -Redis high availability. For space reasons, this chapter will only talk about master slave replication.

The reason to start with master-slave replication is that master-slave replication is the cornerstone of the entire Redis high availability implementation. You can start with the concept of master-slave replication, and why it is the cornerstone will be discussed later when we talk about Sentinel and Redis clusters.

First we need to know, why do we need master-slave architecture for us developers? Wouldn’t an instance of Redis work?

In addition to the above mentioned load exceeding the maximum that Redis can handle on a single machine, there is another situation in which Redis cannot guarantee its own high availability. Even if Redis can hold all the traffic, what if the machine on which the Redis process is running dies? The request will turn around, the volume of traffic will crash your DB, and you can pack a P0 and go home.

Also, if your Redis needs really exceed the capacity of your single machine, what do you do? Multiple separate instances of Redis? So if the user’s cached data exists in instance 1, the next time the user accesses instance 2, does the user need to go through DB again? Unless you can maintain a relationship between users and Redis instances (which is often a tricky logic), there is no point in deploying multiple Redis instances to scale horizontally.

Can a master-slave structure solve this problem?

And we can kind of visualize it from a graph.

In master-slave synchronization, we divide the roles of nodes into master and slave to form one master and multiple slaves. The slave provides read operations, while the master is responsible for write operations. In this way, the slave can handle more service requests.

In most business scenarios, there are more reads than writes to Redis, so when the read requests are particularly heavy, we can add slave nodes to make Redis carry more traffic.

You can’t write data to the master, so if I connect to the slave, I can’t get the previous data.

Didn’t I write this subtitle? In master/slave replication, the slave synchronizes data from the master according to a policy. With Redis, we can use the Slaveof command to make one Redis instance replicate the state of another Redis. The copied Redis instance is the master node, and the machine executing the Slaveof command is the slave node.

The master/slave replication of Redis is divided into two steps: synchronization and command propagation.

The synchronization operation is used to copy the memory status of the Master node to the Slave node. The command propagation operation is used to change the server status when the client performs some write operations. At this time, the status of the Master node is inconsistent with that when the synchronization operation is performed. Therefore, command propagation is required to bring the master and slave states back into line.

The general process of synchronization is as follows:

  • The slave node sends messages to the master nodesyncThe command
  • Master receivedsyncThe command will then be executedbgsaveCommand, Redis forks out a child process to generate the RDB file in the background, and records the write commands in the buffer during synchronization
  • After the file is generated, the master sends the RDB file to the slave. When the RDB file is received from the server, it is loaded into the memory
  • The master then sends all the write commands recorded in the buffer to the slave, which then plays back the commands to update its database status to match that of the master

To give you a clearer idea of the process, let’s look at the diagram again.

🐂🍺, what if the slave fails again after the synchronization? Slave will probably be inconsistent with MAste again after it restarts, right?

Indeed, this refers to a term called interrupted continuation. The slave performs full replication when connecting to the master for the first time. In this case, the new and old versions of Redis handle this differently.

Before Redis2.8, if the slave disconnects and reconnects after the master completes the synchronization, the master sends the sync command to the slave. The master sends the full data to the slave again.

The problem is that most of the data is in order and full synchronization again is not necessary. After Redis2.8, to solve this problem, the psync command was used instead of sync.

In simple terms, the psync command sends all write commands received by the master during the disconnection to the slave. The status of the slave is the same as that of the master after the slave is replayed.

Hehe, is that it? Do you know how psync works? Or is it just for use?

The implementation of psync depends on the offset that both master and slave maintain.

Each time the master propagates the command to the slave and the number of bytes of data propagated, it adds its offset to the number of bytes propagated. The slave also updates its offset each time it receives bytes of data.

Based on offset, only a simple comparison is required to know whether the current master and slave states are consistent. Then, based on offset, the instructions corresponding to the corresponding offset are transmitted to the slave for replay. Therefore, even if the slave hangs up during synchronization, based on offset, it can still achieve the effect of resumable from the breakpoint.

Oh, no, no, no. What if the master died? The master data is updated after you restart the slave. According to you, the two data will never be consistent

In fact, in addition to offset, the slave will also carry a rUNID of the master instance after disconnection and reconnection. Each service instance has its own unique RUNID, and its RUNID will be changed as soon as the Redis service is restarted.

After receiving the rUNID, the master determines whether the rUNID is consistent with its own rUNID. If the rUNID is consistent with its own RUNID, it indicates that the connection with the slave is still established before the disconnection. If the RUNID is inconsistent, it indicates that the master is also down during the disconnection, and data needs to be fully synchronized to the slave.

Even if you can solve the problem, you’re maintaining an offset. Where does the command for the offset come from? From the sky? How do I know what those orders are?

Indeed, we need this offset to get the data we really need — the instructions — and Redis does this by copying the backlogs.

It’s a fancy name, but it’s actually a queue. Like recursion, polling and pass-through, it sounds fancy but is actually simple. Anyway, the default size of the replication backlog buffer is 1M. When Redis propagates commands, in addition to sending write commands to slave, it also writes commands to the replication backlog buffer and associates them with the current offset. In this way, the corresponding instruction can be obtained through offset.

However, due to the limited size of the buffer, if the slave is disconnected for too long, the earlier instructions in the replication backlog buffer will be overwritten by the new instructions. This can be regarded as a queue, and the elements that were enqueued earlier have been enqueued.

Since there is no corresponding offset, the instruction data cannot be obtained, and Redis will perform full synchronization. Of course, if offset still exists in the replication backlog, partial synchronization is performed with the corresponding offset.

Based on the preceding full and incremental master/slave replication, the master/slave switchover can be implemented when the master fails to ensure the normal running of services. In addition, it can solve the problem of data loss in abnormal cases. The read-write separation based strategy also increases concurrency across the Redis service.

Don’t brag, you said this what master copy has no disadvantages?

In fact, there are, for example, the master-slave switch mentioned just now. If the HA framework is not used, this process needs to be completed manually by the programmer, and the service caller is notified that the IP address of Redis has changed. This process can be said to be very complicated, and may even involve changes in code configuration. In addition, the previous slave replication was a hung master, and the slave had to change the master library of its replication, which made it more complicated.

In addition, although the read/write separation is implemented, the cluster’s read requests can be expanded due to the architecture of one master with many slaves. However, there is a limit on the concurrency of write requests. That is, the master can handle the maximum concurrency.

Well, that’s the end of this episode and I’ll see you next time.

If you find this article helpful, please give it a thumbs up, a comment, a share and a comment

You can also search the official account [SH full stack notes] on wechat and follow the official account to read other articles in advance

The articles

  • Redis Basics – Dissects the underlying data structures and their usage
  • Redis Basics – How does Redis persist data
  • Take a quick look at K8S and set up your own cluster
  • WebAssembly is all about learning about WASM
  • Introduction to the JVM and garbage collection