Master/slave replication of Redis

In a single redis case, not all requirements can be met

  1. Machine failure. When a machine failure occurs, data may be lost and Redis will not be able to provide services
  2. Single performance bottleneck, the hardware of a machine is limited, can not always improve, and a machine to the back can improve the space is getting smaller and smaller, and the cost is getting higher and higher

Therefore, in addition to improving the performance of the server vertically, the server scale can be improved horizontally, and multiple REDis servers can be deployed. At the same time, the machine failure of the single REDis server can be avoided. It is only necessary to ensure that the data of the multiple Redis servers are synchronized. The failure of one server does not affect services provided by other servers

Both the master node and the slave node can provide services. The slave node copies data from the master node. When the master node breaks down, the slave node can provide services

  • A master can use multiple slaves
  • A slave can only have one master
  • Data can only be transferred from master to slave

This one master multi slave architecture can achieve high availability, read and write separation

Establish a primary/secondary connection

There are several ways to establish a connection, active command Settings and configuration file Settings

  • Slaveof IP port command (asynchronous)
  • Slave of no one The slave node is disconnected, but the data on the master node is not cleared

The slave node sends the request command to the master node with the specified IP address and port. After the connection is established, the slave node data is cleared and the master node data is replicated. The advantage of using commands is that a command can be set up immediately without having to restart the server.

Configuration file Settings

In the configuration file, there is a configuration about master/slave replication. The advantage of the configuration file is that it is suitable for centralized management, but you need to restart Redis, and the configuration file is usually used to achieve this.

  • Slaveof Host Port Sets the IP address and port number of the primary node
  • Slave-read-only yes Indicates that the slave node is read-only

The slave node is set to read-only to prevent modification and data inconsistency between the slave and master libraries. At the same time can achieve read and write separation.

Synchronous data

After the master/slave node is set up, the slave node synchronizes data from the master node. Data can be synchronized in either full replication or partial replication

When the master/slave connection is established for the first time or the slave node is restarted, full replication is used. Full replication sends all data from the master node to the slave node, which is a heavyweight operation.

During data transmission, there are several key attributes:

  • Runid: A string of 40 random hexadecimal characters that uniquely identifies the running server and is automatically generated randomly when the server is started. After the master connects to the slave, the master sends its RUNID to the slave for identification.
  • Replication buffer: Also known as replication backlog buffer, is a FIFO queue, with a default size of 1MB, used to store update commands executed by the server. When the total number of commands exceeds the buffer capacity, the earliest command is displayed. The server creates a replication buffer if AOF persistence is enabled or if it is set to the primary node.
  • Offset: is the command offset in the copy buffer. In the buffer, the command is stored in bytes in AOF format. Offset records the position of the command. The master records the offset sent to all slaves last time, compares the offset sent by the slaves, finds the position in the buffer queue, and sends all subsequent commands.

If the master runs the shutdown save command, the RDB persistence will be performed to save the current Runid and offset of the master. After data recovery, the slave will consider the data as the original master

Full amount of copy

The following figure shows the procedure for full replication

As mentioned above, full copy is a heavyweight operation with these overhead aspects

  1. The master of bgsave
  2. Transfer RDB files
  3. Slave Clears data and loads the RDB
  4. Possible AOF

Part of the copy

Partial replication has been provided since Redis2.8 and, in some cases, uses lighter partial replication to reduce overhead. However, when partial replication conditions are not met, full replication is still used.

For example, the primary/secondary connection loses packets due to network reasons. After the connection is re-established, determine whether partial replication can be performed.

  1. The slave sends the previously stored master’s RUNId and offset to determine whether it is the same server and whether the recorded data location is still in the buffer queue
  2. If one of them is not satisfied, a full copy is made
  3. If they’re all the same, ignore them
  4. If the offset is different and there are records in the buffer, a partial copy is made

heartbeat

When synchronizing commands, the master and slave use a heartbeat mechanism to exchange data.

  • **slave Heartbeat, ** sends REPLCONF ACK {offset} every second to try to get the latest record of the command, and also to determine whether the master is online
  • ** Master heartbeat, ** Pings the slave node at a fixed interval, rep-ping-slave-period (default: 10 seconds), to check whether the slave is still online. The master determines the number of slaves to work and the delay based on the commands sent by the slaves. If the number or delay does not meet the requirements, the master forcibly disables the write function and stops data synchronization to ensure data stability

failover

If the server fails, we need to transfer the failure so that the remaining nodes are still working.

For example, if the slave server fails, you can transfer the clients connected to the faulty slave server to use other slaves. However, it is important to note that if a large number of clients are added to the other server, the pressure will increase. Therefore, it is necessary to allocate them properly.

If the master server fails, one slave server becomes the master server, the other slave servers are connected to the new master server, and mounted clients are migrated.

Redis sentry

If a master-slave replication fails, we know what to do about it, but the failover is not automatic. We need to monitor the nodes and move them manually. Or you can use a script.

However, Redis provides Redis Sentinel to address failover in master/slave replication. Sentinel is a special redis that does not store data. Sentinel is mainly used to monitor whether a node has a failure, automatically transfer the failure, and notify other sentinels and clients

A Sentinel can monitor multiple master-slave architectures

When multiple Sentinels detect a fault in the Master, a Sentinel leader is elected between the sentinels, who can be treated as a client to handle the fault. A suitable slave is then selected from the slaves as new_master, and the other slaves and clients are notified which node is new_master. If old_master is restarted, it will be connected to new_master as a slave node

Sentinel start: redis-sentinel – port.conf

Key attributes of the configuration file:

  • Port Indicates the port number. The default value is 26379
  • The Sentinel monitor indicates which master the sentinel monitors. When the Sentinel determines that the number of failures of a redis node exceeds quorum, the sentinel takes the node offline
  • Sentinel down-after-millseconds Indicates the number of times in which the master fails to respond
  • Sentinel parallel- Syncs allows up to how many synchronization operations can be performed at one time. The more the synchronization, the faster the synchronization rate and the higher the server pressure
  • Sentinel failover-timeout Synchronization timeout period

Sentinel does not need to be configured to monitor slave nodes. Sentinel can monitor slave nodes only by analyzing the relationship between the master and slave nodes

The number of nodes should be >=3, preferably odd, to ensure fairness and quick election of a single leader.

Monitoring nodes

Sentinel starts by sending info to the master and then sends info to its slaves based on the master’s master/slave relationship. Sentinel then sends info to the master and slave every 10 seconds

  1. You can run the master info command to obtain the following information: master information and status, slave node address
  2. Then the slave node info command is used to obtain the slave status and properties

When a new sentinel discovers that the master is already monitored by the Sentinel node, it will subscribe to other Sentinel nodes through the channel of the master node.

Every two seconds, each Sentinel node exchanges its own information with other Sentinel nodes through the Sentinel: Hello channel, so that the sentinel nodes can quickly “communicate” with each other, and the synchronization of information is very fast.

At the same time, every second, each Sentinel node will Ping other Sentinel nodes and Redis nodes to detect heartbeat and determine whether the node is online.

Find fault

According to the previous configuration, a Sentinel detects that a master node’s response has timed out and performs a subjective offline flag. The command is then sent between sentinels: sentinel is-master-down-by-addr asks other Sentinel nodes to determine whether to take an objective offline and prepare for an election.

  • Subjective offline: S_DOWN, which is a sentinel node’s “subjective” judgment on the Redis node. The connection with the Redis node may have timed out due to the current sentinel network and other reasons, so only the Sentinel personally thinks that the Redis node is offline.
  • Objective offline: O_DOWN, when other Sentinel nodes agree on the redis node (all sentinels with more than quorum count consider the REDis node offline)

failover

As mentioned earlier, a leader is elected from the Sentinel nodes to complete the failover. The leader will select the appropriate node from the slave nodes as the master node and let the other slave nodes synchronize the new master node

Sentinel Leadership Election

Each sentinel node that has gone subjective offline sends the sentinel IS-master-down-by-addr command, hoping to become the leader.

  • Sentinel nodes that receive the order at this time will agree to the vote if they do not vote for other Sentinel nodes (i.e. each node has only one chance to vote, first come, first served).
  • When a sentinel node has more than half of the votes and exceeds quorum, that node acts as the lead node to perform the failover.
  • If there are more than one leader in the voting, there will be a new election, the number of voting rounds +1, to ensure that there is only one leader.

Select the appropriate slave node

The Sentinel leader node will first select an appropriate slave node from online slave nodes and set it as the new master node. The filtering conditions are as follows:

  1. Filter by response speed and earliest node disconnection from master.
  2. Then determine the priority and select the slave node with the highest slave-priority (slave-priority). The default value is the same. You can set a higher priority for the slave node with better hardware configuration. Return if it exists, continue filtering if it does not exist.
  3. Based on the offset of the slave node, select the node with the largest offset (the replicated data is the most complete). Return if it exists, continue filtering if it does not exist.
  4. Select the slave node that starts first.

failover

  1. The leader selects the most suitable slave node and makes it run the slave of no one command to become master
  2. Run slaveof new_master IP port on the remaining slave nodes to become the slaveof new_master
  3. Synchronize data from new_master (the number of slaves that can be synchronized at a time depends on the parallel-syncs parameter)
  4. Set old_master to new_master’s slave and monitor it. When it comes online again, let it synchronize data from new_master