The replication function of Redis is divided into two steps: sync and command propagate:

  • Synchronization is used to update the database state of the slave server to the current database state of the master server.
  • Command propagation is used to restore the database status of the primary and secondary servers to the consistent state when the database status of the primary and secondary servers is changed.

synchronous

Redis uses the psync command to synchronize primary and secondary data. The synchronization process includes full replication and partial replication.

Full replication: In the initial replication scenario, all data on the primary node is sent to the secondary node at a time. When a large amount of data is generated, the primary and secondary nodes and the network are overspent.

Partial replication: Deals with the network loss caused by intermittent network disconnection during the primary-secondary replication. When the secondary node reconnects to the primary node, the primary node sends the lost data to the secondary node if conditions permit. Because the reissued data is much smaller than the full data, the high cost of full replication can be effectively avoided.

The following components are required to run the psync command:

  • The offset of replication between the primary and secondary nodes
  • The primary node replicates the backlog buffer
  • Running ID of the primary node

Slave nodes that participate in replication maintain their own replication offsets. After a write command is processed, the master node accumulates the command bytes and records them in the master_REPL_offset indicator in Info Replication. After receiving the command from the master node, the slave node also records its own offset and reports its own replication offset to the master node every second. By comparing the replication offsets of the primary and secondary nodes, you can determine whether the data on the primary and secondary nodes is consistent.

The replication backlog buffer is a fixed-length queue stored on the primary node, with a default size of 1MB, that is created when the primary node has connected secondary nodes. When the master node responds to a write command, it not only sends the command to the slave node, but also writes it to the replication backlog buffer.

The replication backlog buffer has a limited size and can only store the most recent replication data, which is used to recover data when partial replication and replication commands are lost.

Each Redis node is dynamically assigned a 40-bit hexadecimal string as its run ID when started. The main function of the run ID is to uniquely identify the Redis node. For example, the slave node saves the run ID of the master node to identify which master node it is replicating.

Full amount of synchronization

Slaveof Command execution

    1. Send the psync command from the secondary node to synchronize data. Because the replication is performed for the first time, the secondary node does not have the replication offset or the running ID of the primary node. Therefore, the psync? 1.
    1. The primary node is based on PSYNC? -1 Parses that the current replication is full and replies with + FULLRESYNC.
    1. The response data received from the master node is saved with the run ID and offset.
    1. The primary node performs bgSave to save the RDB file locally.
    1. The primary node sends the RDB file to the secondary node. The secondary node saves the received RDB file as the data file of the secondary node. After receiving the RDB, the secondary node prints logs to view the amount of data sent by the primary node.

Be careful with large data volumes on primary nodes, such as RDB files that exceed 6GB. If the RDB transfer time exceeds the value set by repl-timeout, the secondary node receives the RDB file and clears the downloaded temporary file, causing a full replication failure.

    1. For the master node began to save RDB snapshots from the node receives the complete period, the master node response still read command, so the master node will write command during the preservation in copying the client buffer, when from the node after loading the RDB file, the master node then sends data in buffer from node, guarantee the data consistency between master-slave.

If the primary node takes too long to create and transfer the RDB, a primary replication client buffer overflow may occur. The default value is client-outport-buffer-limit slave 256MB 64MB 60. If the buffer consumption exceeds 64MB for 60 seconds or directly exceeds 256MB, the master node directly shuts down the replication client connection, causing a full synchronization failure.

    1. After receiving all data from the master node, the master node clears its old data. The procedure corresponds to the following log.
    1. The RDB file is loaded after the node clears data. This step still takes time for the RDB file to be enlarged. You can calculate the time difference between logs to determine the total RDB loading time.
    1. The primary server receiving the SYNC command executes the BGSAVE command, generates an RDB file in the background, and uses a buffer to record all write commands executed from now on.
    1. When the BGSAVE command is executed on the primary server, the primary server sends the RDB file generated by the GBSAVE command to the secondary server. The secondary server receives and loads the RDB file to update its database status to that of the primary server when the BGSAVE command is executed.
    1. The master server sends all the write commands recorded in the buffer to the slave server, and the slave server executes these write commands to update its database state to the current state of the master database.

By analyzing all the processes of full copy, you will find that full copy is a very time-consuming operation. Its time cost mainly includes:

  • The primary node bgSave time
  • RDB file network transfer time
  • Clear the data time from the node
  • The time to load the RDB from the node
  • Possible AOF rewrite time

Full synchronization consumes a large amount of time and requires multiple persistent operations and network data transmission. During this process, CPU, memory, and network resources of the servers where the primary and secondary nodes reside are consumed. Therefore, use partial synchronization to avoid full replication in other scenarios, except that full synchronization is unavoidable in the first replication.

Part of the synchronization

Partial replication is an optimization measure made by Redis for the high cost of full replication. It is implemented by using the psync {runId} {offset} command. When a node is copied from the master node, if there is a network failure or command the anomalies such as lost, will be to the Lord from the node’s orders to reissue lost data, if the master node replication backlog buffer existed from this part of the data is sent directly to the node, thus ensure the master-slave node replication consistency. This part of the data is generally much smaller than the full amount of data, so the cost is very small.

    1. When the network between the primary and secondary nodes is interrupted, if the repl-timeout period is exceeded, the primary node considers that the secondary node is faulty and interrupts the replication connection.
    1. When the primary node is disconnected from the secondary node, the primary node still responds to commands, but the commands cannot be sent to the secondary node due to replication connection interruption. However, the primary node has an internal replication backlogs buffer (repl- backlogs -buffer), which can still store the write command data of the recent period. The default maximum cache is 1MB.
    1. After the network between the primary and secondary nodes is restored, the secondary nodes are connected to the primary node again.
    1. After the master/slave connection is restored, the slave node has saved its copied offset and the running ID of the master node. They are therefore sent to the master node as psync parameters, requesting a resend copy operation.
    1. After receiving the psync command, the primary node checks whether parameter runId is consistent with its own. If so, it indicates that the previous replication is the current primary node. The self-replication backlog buffer is then searched based on the offset parameter. If the data after the offset exists in the buffer, a +CONTINUE response is sent to the slave node, indicating that partial replication is available.
    1. The master node sends the data in the replication backlog buffer to the slave node based on the offset to ensure that the master/slave replication is in a normal state.

The heartbeat detection

After replication is established, the primary and secondary nodes maintain long connections and send heartbeat commands to each other, as shown in the following figure.

The primary and secondary heartbeat detection mechanism is as follows:

    1. The primary and secondary nodes have a heartbeat detection mechanism for each other and communicate with each other as clients. Run the client list command to view information about replication clients. The connection status of the primary node is FLAGS =M, and that of the secondary node is flags=S.
    1. By default, the primary node sends the ping command to the secondary node every 10 seconds to check whether the secondary node is alive and connected. The rpl-ping-slave-period parameter is used to control the sending frequency.
    1. The slave node sends the replconf ack {offset} command every second in the main thread to report its current replication offset to the master node.

The replconf command not only monitors the network status of the primary and secondary nodes in real time, but also reports the replication offset of the secondary node. The primary node checks whether the replication data is lost based on the offset uploaded by the secondary node. If the data on the secondary node is lost, the primary node pulls the lost data from the replication cache and sends it to the secondary node.

Asynchronous replication and command propagation

The master node not only reads and writes data, but also synchronizes write commands to the slave node. The write command is sent asynchronously. That is, the master node directly returns the write command to the client without waiting for the slave node to complete the replication.

This asynchronous process is handled by command propagation, which not only sends write commands to all slave servers, but also queues them into the replication backlog buffer.

Afterword.

Personal blog, welcome to play