Redis High Availability: Principles of data consistency synchronization between master and slave architectures

In the Core of Redis: The Only Secret that Can’t Be Broken Fast, Code-brother revealed the data structure, IO model, thread model and progressive Rehash underlying the five data types of Redis, and grasped the essential reason why Redis is fast.

Then, in Redis Log: The Ultimate In Downtime and Fast Recovery, we revealed a highly available way to quickly recover from a Redis outage by re-reading the RDB snapshot and executing AOF logs.

High availability has two meanings: one is to minimize data loss, and the other is to provide services as much as possible. AOF and RDB ensure that data persistence is minimized, while master-slave replication is to add copies, one copy of data to multiple instances. Even if one instance goes down, other instances can still provide services.

This article takes you through the master-slave replication architecture, one of Redis high availability technology solutions.

This piece of hardcore, suggested to collect slowly taste, I believe readers friends will have a qualitative improvement. Please correct any mistakes, thank you. Pay attention to “code elder brother byte” set “star mark” the first time to receive quality articles, thank readers for their support.

Core knowledge

The opening remarks

Problems = opportunities. When encountering problems, the heart is actually happy, the bigger the problem means the bigger the opportunity.

Everything has a price, gain will lose, lose will gain, so don’t care about a lot of things, we just need to think about what they want to do, and think about what they are willing to pay for it, and then go for it!

1. Master/slave replication Overview

With RDB and AOF, we are no longer afraid of data loss. However, how to achieve high availability when Redis instance is down?

If one is down and out of service, how many? Maybe we can solve it. Redis provides a master-slave mode to copy redundant data to other Redis servers through master-slave replication.

The former is called the master node and the latter is called the slave node. The replication of data is one-way and can only go from the master node to the slave node.

By default, each Redis server is the primary node; And a master node can have multiple slave nodes (or none), but a slave node can only have one master node.

Brother: How to ensure the consistency of data between master and slave?

To ensure the consistency of duplicate data, master/slave architectures adopt read/write separation.

Read operation: master and slave library can execute;
Write operations: the master performs first and then synchronizes the write operations to the slave.

Brother 65: Why do we use read-write separation?

We can assume that both master and slave libraries can execute write instructions. If multiple changes are made to the same data, each change is sent to different master and slave instances, resulting in inconsistent copies of the instance data.

If Redis needs to lock and coordinate changes to multiple instances in order to keep data consistent, Redis won’t do that!

Brother 65: Is there any other use for master-slave replication?

Fault recovery: When the active node goes down, other nodes can still provide services.
Load balancing: The Master node provides the write service and the Slave node provides the read service to share load.
High availability cornerstone: Is the foundation of sentinel and Cluster implementation, is the cornerstone of high availability.

2. Set up primary/secondary replication

Master-slave replication is initiated entirely on the slave node without requiring us to do anything on the master node.

Brother 65: How to set up master slave replication architecture?

Replicaof (slaveof prior to Redis 5.0) can be used to form the relationship between master and slave libraries.

You can enable primary/secondary replication on the secondary node in the following three ways:

The configuration file

Add replicaof < masterIP > < masterPort > to the configuration file from the server.
Start the command

–replicaof
Client command

After multiple Redis instances are started, run the following command on the client: replicaof

to make the Redis instance become the secondary node.

For example, suppose you have instance 1 (172.16.88.1), instance 2 (172.16.88.2), and instance 3 (172.16.88.3). Run the following commands on instance 2 and instance 3 respectively. Instance 2 and instance 3 become slave libraries of instance 1. Instance 1 becomes Master.

Replicaof 172.16.88.1 6379Copy the code

3. Principle of master/slave replication

Master/slave mode Once read/write separation is implemented, all data is written only to the master library, instead of coordinating the three instances.

When the master library has the latest data, it synchronizes it with the slave library, so that the data from the master and slave libraries is consistent.

Brother: How is master/slave library synchronization accomplished? Is the master data passed to the slave at one time or synchronized in batches? How do you synchronize in normal operation? If the network between the master and slave libraries is disconnected, will the data remain consistent after reconnection?

65 elder brother you have so many problems, synchronization is divided into three cases:

The first full replication of the primary and secondary libraries;
Synchronization between master and slave during normal operation;
The network between the primary and secondary libraries is disconnected and reconnected.

The primary and secondary libraries perform full replication for the first time

Brother 65: I am dizzy, from the master from the first synchronization between the library to speak of it.

The process of the first replication between master and slave libraries can be divided into three stages: connection establishment stage (i.e. preparation stage), synchronization stage from master to slave, sending new write commands during synchronization stage to slave.

Directly above, there is a global perception from the whole, which will be described in detail later.

Establish a connection

This phase establishes a connection between the primary and secondary nodes to prepare for full data synchronization. The slave database establishes a connection with the master database. The slave database executes replicaof and sends the psync command to inform the master database that synchronization is imminent. After the master database confirms its response, synchronization between the master and slave databases begins.

Brother 65: How does the slave library know the master library information and establish a connection?

The replicaof configuration item in the replicaof configuration file of the slave node configates the IP address and port of the master node, and the slave node knows that it wants to connect to the master node.

Two fields, masterHost and masterPort, are maintained internally on the secondary node to store the IP and port information of the primary node.

Replicaof is executed from the database and the psync command is sent, indicating that data synchronization is to be performed. The master database receives the command and starts replication according to the parameters. The command contains the runID of the master library and the replication progress offset.

RunID: a unique ID is automatically generated for each Redis instance startup. The primary library runID is not known for the first time. .
Offset: set the value of the first replication to -1, which indicates the offset of the first replication progress.

The master library receives the psync command and returns the FULLRESYNC response to the slave library with two parameters: the master runID and the master’s current replication progress offset. These two parameters are logged when the response is received from the library.

The FULLRESYNC response represents the full replication taken by the first replication, that is, the master database copies all the current data to the slave database.

The master library synchronizes data to the slave library

The second stage

The master runs the BGsave command to generate an RDB file and sends the file to the slave library, while the master library creates a replication buffer for each slave to record all write commands received since the RDB file was generated.

After receiving the RDB file from the library, save it to disk, and empty the current database data, and then load the RDB file data into memory.

Sends the new write command to the slave library

The third stage

After the RDB is loaded from the Slave node, the master node sends the data in the Replication buffer to the Slave node. The Slave node receives and executes the data, and the Slave node synchronizes the data to the same state as the master node.

Brother 65: When the master database synchronizes data to the slave database, can it accept requests normally?

The main library will not be blocked. Redis is the only man who can’t break fast.

Write operations after the RDB file is generated are not recorded in the RDB file. To ensure data consistency between the primary and secondary libraries, the primary library uses a replication buffer to record all write operations after the RDB file is generated.

Why do I need to clear the current database after receiving an RDB file from the database?

This is because the slave may have saved additional data before synchronizing with the master with the replcaof command, preventing interference between the master and slave data.

What the hell is a Replication buffer?

A buffer created on the master to store all master writes during the following three periods.

1) Master performs write operations during bgSave RDB generation;

2) Write operations during network transmission when master sends RDB to slave;

3) Slave Load RDB file write operations during data recovery to memory.

Redis allocates a buffer for data interaction, whether it communicates with the client or the slave library. The client is a client, and the slave library is also a client. After each client is connected to Redis, Redis allocates a proprietary client buffer through which all data interactions take place.

The Master writes data to this buffer and then sends it out over the network, completing the data interaction.

For both incremental and full synchronization, the master allocates a buffer specifically designed to propagate write commands to the slave library to ensure data consistency between the master and slave. This buffer is usually called a replication buffer.

Problems caused by a replication buffer that is too small:

Replication buffer is set by client-output-buffer-limit slave. If the value is too small, the master and slave replication connections will be disconnected.

1) When the master-slave replication connection is disconnected, the master releases the data related to the connection. The data in the replication buffer is lost, at which point the replication process between the master and slave restarts.

2) A more serious problem is that the master/slave replication connection is broken, causing an infinite loop of bgSave and RDB retransmissions on the master/slave.

If the data volume of the primary node is large or the network delay between the primary and secondary nodes is large, the buffer size may exceed the limit. In this case, the primary node disconnects from the secondary node.

In this case, full replication -> Replication Buffer overflow causes connection interruption -> Reconnection -> Full replication -> Replication Buffer overflow causes connection interruption…… The loop.

[top redis must give devops – replication buffer] Therefore, set the hard/soft limit for replication buffer to 512 MB is recommended.

config set client-output-buffer-limit "slave 536870912 536870912 0"
Copy the code

Why not use AOF for master/slave replication? Less data is lost than RDB.

This is a good question for several reasons:

RDB files are binary files. The I/O efficiency of network transmission RDB and disk writing is higher than that of AOF.
RDB is also more efficient than AOF in data recovery from libraries.

Incremental replication

Brother 65: What if the network between master and slave libraries is disconnected? Do YOU want to make full copy again after disconnection?

Prior to Redis 2.8, if the master and slave libraries had network interruptions during command propagation, the master and slave libraries would have to do a full copy again, which was very expensive.

Starting with Redis 2.8, when the network is down, the master and slave libraries continue to synchronize using incremental replication.

Incremental replication: Used for replication after a network interruption. Only the write commands executed by the primary node during the interruption are sent to the secondary node. Compared with full replication, incremental replication is more efficient.

repl_backlog_buffer

The repl_backlog_buffer buffer is where the master records write operations at any time because memory is limited. Repl_backlog_buffer is a fixed-length, circular array that overwrites everything from the beginning if it is full.

The master uses master_repl_offset to record the offset it writes to, and the slave uses slave_repl_offset to record the offset it reads.

The offset increases when the master receives a write. The copied offset slave_REPL_offset in REPL_backlog_buffer increases as synchronous write instructions are continuously executed from the library.

Normally, these two offsets are roughly equal. During the network disconnection phase, the master library may receive new write commands, so master_REPL_offset will be greater than slave_repl_offset.

After the master and slave are disconnected and reconnected, the slave first sends the psync command to the master and sends its runID and slave_repl_offset commands to the master.

The master only needs to synchronize the commands between master_REPL_offset and slave_repl_offset to the slave library.

The following figure shows the incremental replication process:

If the repl_backlog_buffer is too small to read from the library, it will be overwritten by the Master’s new write operation.

We need to avoid this by performing a full copy once it is overwritten. We can adjust the repl_backlog_size parameter to control the buffer size. Calculation formula:

repl_backlog_buffer = second * write_size_per_second
Copy the code

Second: indicates the average time required by the secondary server to disconnect and reconnect to the primary server.
Write_size_per_second: indicates the average size of command data generated per second (total size of write commands and data).

For example, if the primary server generates an average of 1 MB of write data per second and the secondary server takes an average of 5 seconds to reconnect to the primary server after being disconnected, then the size of the replication backlog buffer cannot be less than 5 MB.

To be safe, you can set the size of the replication backlog buffer to 2 * second * write_size_per_second to ensure that most disconnection cases can be handled by partial resynchronization.

Command propagation based on long connections

Brother 65: After complete synchronization, how to synchronize the normal operation process?

When the master and slave libraries complete full replication, a network connection is maintained between them. The master library synchronizes subsequent command operations to the slave library through this connection. This process is also known as command propagation based on long connections.

In the command propagation phase, in addition to sending write commands, the master and slave nodes also maintain heartbeat mechanisms: PING and REPLCONF ACK.

Primary -> Secondary: PING

At specified intervals, the master node sends the PING command to the slave node. The purpose of the PING command is to enable the slave node to determine the timeout.

From -> master: REPLCONF ACK

During command propagation, the slave server sends commands to the master server at a default rate of once per second:

REPLCONF ACK <replication_offset>
Copy the code

Replication_offset is the current replication offset of the server. Sending the REPLCONF ACK command has three effects on the primary and secondary servers:

Check the network connection status of the primary and secondary servers.
Assist to implement the Min-Slaves option.
The slave node sends its own slave_REPLICATION_offset command. The master node uses its own master_REPLICATION_offset command to compare the slave node. If the slave node data is missing, the master node sends its own slave_replication_offset command to the slave noderepl_backlog_bufferMissing data is found in the buffer and pushed.Note that offset and REPL_backlog_buffer buffers can be used not only for partial replication, but also for command loss situations; The difference lies in that the former is carried out after disconnection and reconnection, while the latter is carried out without disconnection of the master and slave nodes.

How do I determine whether to perform full or partial synchronization?

In Redis 2.8 and later, the secondary node can send the psync command to request data synchronization. The synchronization mode may be full or partial depending on the current status of the primary and secondary nodes. This article uses Redis 2.8 and later versions as an example.

The key is psync execution:

The slave node sends packets based on the current statuspsyncCommand to master:
- If the slave node never executesreplicaofIs sent from the nodepsync ? - 1To send a full replication request to the primary node.
- If executed from the node beforereplicaofsendpsync <runID> <offset>, runID is the runID of the primary node saved from the last replication, and offset is the replication offset saved from the secondary node when the last replication ended.
The primary node is based on the receivedpsyncCommand and current server status to determine whether to perform full or partial replication:
- RunID is the same as the runID sent from the node and is sent from the nodeslave_repl_offset The following data is inrepl_backlog_buffer If both exist in the buffer, replyCONTINUE, indicates that partial replication will be performed. The slave node waits for the master node to send its missing data.
- The runID is different from the runID sent from the slave node, or the data sent from the slave node after slave_REPL_offset is no longer on the master noderepl_backlog_buffer In the buffer (which was squeezed out of the queue), replies to the slave nodeFULLRESYNC <runid> <offset>Where, runID stands for the current runID of the primary node, offset stands for the current offset of the primary node. The secondary node saves the two values for future use.

If a slave is disconnected from the master for too long, its slave_REPL_offset from the master repl_backlog_buffer has been overwritten, and full replication is performed between the slave and master.

Conclusion under

Each slave library records its slave_REPL_offset, and the replication progress of each slave library may not be the same.

When reconnecting to the master database for recovery, the slave database will send its recorded Slave_repl_offset to the master database through the psync command. The master database will determine whether the slave database can perform incremental replication or full replication based on the replication progress of the slave database.

The replication buffer and repl_backlog

The Replication buffer, corresponding to each slave, passesconfig set client-output-buffer-limit slave Settings.
repl_backlog_buffer Is a ring buffer. Only one ring buffer exists in the master process and is shared by all slaves. The size of the repl_backlog is set by the repl-backlog-size parameter, which defaults to 1M, The size of the backlogs can be estimated based on the sum of commands generated per second (master executes RDB bgsave) + (master sends RDB to slave) + (slave loads RDB files). The repl-backlog-size value is not less than the product of the two.

In general, a replication buffer is a buffer on the master used by clients connected to the slave in the event of a full replication from the master, while repl_backlog_buffer is a dedicated buffer on the master used for continuous write operations to support incremental replication from the slave.

Repl_backlog_buffer is a dedicated buffer that starts receiving write commands after the Redis server is started, which is shared by all slave libraries. The master and slave databases record their own replication progress. Therefore, different slave databases send their replication progress (Slave_REPL_offset) to the master database during recovery, so that the master database can synchronize with it independently.

As shown in the figure:

4. Master/slave application problems

4.1 Read/write Separation Problems

Data expiration problem

Brother 65: In the master-slave replication scenario, will expired data be deleted from the slave node?

This is a good question. For data consistency between master and slave nodes, slave nodes do not actively delete data. We know that Redis has two deletion strategies:

Lazy deletion: When the client queries the corresponding data, Redis determines whether the data has expired and deletes the data.
Periodically delete: Redis deletes expired data by performing scheduled tasks.

Does the client read expired data from the node?

Starting from Redis 3.2, when reading data from a node, determine whether the data has expired. If it expires, the client is not returned and the data is deleted.

4.2 Limiting the Memory size of a Single machine

If the Redis single machine memory reaches 10GB, the synchronization time of a slave node is in the order of minutes. If there are many slave nodes, the recovery speed will be slower. If the read load is high and the secondary node cannot provide services during this period, the system is under great pressure.

If the amount of data is too large, the primary node fork + saves the RDB file during the full replication phase. As a result, the secondary node cannot receive data for a long time and a timeout is triggered, the data synchronization between the primary and secondary nodes may also fall into full replication -> Replication interruption due to timeout -> Reconnection -> Full replication -> Replication interruption due to timeout…… The loop.

In addition, the absolute amount of memory on the primary node should not be too large, and the proportion of memory on the host should not be too large: it is best to use only 50-65% of the memory, leaving 30-45% for bgsave commands, replication buffers, etc.

conclusion

Role of master-slave replication: AOF and RDB binaries ensure fast data recovery during downtime and prevent data loss as much as possible. However, after the outage, the service still cannot be provided, so the master/slave architecture and read/write separation have evolved.
Master/slave replication: connection establishment, data synchronization, and command transmission. The data synchronization phase is divided into full replication and partial replication. PING and REPLCONF ACK commands are used to check the heartbeat between the primary and secondary nodes.
Although master-slave replication solves or relieves problems such as data redundancy, fault recovery, and read load balancing, its disadvantages are still obvious: fault recovery cannot be automated. Write operations cannot be load balanced. Storage capacity is limited by single machine; Solving these problems requires the help of sentinels and clusters, which I will cover in a later article.

65 elder brother: code elder brother your picture is really good-looking, the content is good, follow your article I harvest a lot, I want to collect, praise, look and share. Let more excellent developers see common progress!