These reviews

It is recommended that students who have not read the previous article read the previous article first:

“The Old Driver will teach you how to interview (1) : Redis basics and Data Persistence”

Redis Expiration Strategy and Cache Avalanche, Breakdown, And Penetration

“Redis High Availability master-Slave Mode”

The guard mode

The master-slave mode of Redis can only achieve read high availability. The fatal weakness is that the write high availability cannot be achieved. Once the master node is down, the whole cluster will not be able to write data, which does not meet our expectations of Redis high availability cluster.

So, is there a way to not only read high availability, but also write high availability, of course there is, and that’s the Sentinel mode that we’re going to introduce today.

The sentinel mode can be understood as an upgraded version of the master/slave mode. In the master/slave mode, the master node and slave node are set from the beginning. In the Sentinel mode, the master node can be transferred. Through elections, a slave node can be promoted to master, ensuring that there is a master node that can support write operations in any case, and indirectly realizing high write availability.

Introduction to the

Sentinel mode can be seen as an updated version of the previous master-slave mode. There is no failover in master-slave mode, and the master node dies when it dies. Sentinel mode is designed to solve this problem.

  • Cluster monitoring: Monitors whether the Redis master and slave processes are working properly.
  • Message notification: If a Redis instance fails, the sentry is responsible for sending a message as an alarm notification to the administrator.
  • Failover: If the master node fails, it is automatically transferred to the slave node.
  • Configuration center: Notifies the client client of the new master address if failover occurs.

The core

Suppose we now have two sentinel instances that look like this:

+----+         +----+
| M1 |---------| R1 |
| S1 |         | S2 |
+----+         +----+
Copy the code

Quorum, majority

  • Quorum: Indicates the number of sentinels that believe the master is down.
  • Majority: indicates the minimum number of sentries authorized to perform a master/slave switch. The number should be more than half.

With only two nodes, if the master goes down, only one sentry in S1 and S2 thinks the master is down can switch, and s1 and S2 elect a sentry to perform the failover.

In this case, the majority needs to be operational.

So at this point, if only the M1 process is down and Sentry S1 is running, then failover is OK. However, if the entire M1 and S1 machine goes down, and the sentinel only has 1, there is no majority to allow a failover, even though there is an R1 on the other machine.

So here’s a tip:

  • Sentinels need at least three instances to be robust, and ideally an odd number of instances.

To vote for a majority, you need more than half of the sentries to agree.

2 sentries, majority=2 3 sentries, majority=2 4 sentries, majority=2 5 sentries, majority=3 6 sentries, majority=3 7 sentries, majority=4...Copy the code

It can be seen that the majority can be maximized only when the number is odd.

The classic three-sentry model looks like this:

       +----+
       | M1 |
       | S1 |
       +----+
          |
+----+    |    +----+
| R2 |----+----| R3 |
| S2 |         | S3 |
+----+         +----+
Copy the code

If the machine where M1 is located is down, there are still 2 sentries left. S2 and S3 can agree that the master is down, and then elect one sentry to perform the failover. Meanwhile, the majority of the 3 sentries is 2, so the remaining 2 sentries are running and the failover is allowed.

Redis Sentry mode data loss problem

First, a conclusion:

  • Sentinel + Redis master-slave deployment architecture does not guarantee zero data loss, only high availability of Redis cluster.

Then let’s talk about two situations that can lead to data loss:

The first is that asynchronous replication may cause data loss

If the master fails to replicate data to the Salve asynchronously, the data may be lost.

Second: data loss caused by brain split

Split-brain is when, due to network fluctuations or other factors, the master’s machine is suddenly inaccessible to other sentinels, but the master node is still functioning.

At this point, the sentry thinks that the master node is down, and starts to elect a new master node. The other salve nodes are switched to master nodes. At this point, there are two master nodes in the cluster.

At this point, a new master node is created, but the client is still writing data to the old master, but when the old master resumes access, it will be mounted to the new master node as a salve, and its data will be emptied. Data is copied from the new master, and data written to the old master is lost in the split brain process.

Is there a solution for data loss? No, because this problem exists objectively, we cannot solve this problem, we can only try to reduce the loss caused by this problem, in this case, we can use the following two configurations:

min-slaves-to-write 1
min-slaves-max-lag 10
Copy the code

These two configurations mean:

The meaning of the two parameters:

  • There must be at least one slave and the delay of data replication and synchronization cannot exceed 10 seconds.
  • If the data replication and synchronization delay for all slaves exceeds 10 seconds, then the master will not receive any more requests.

(1) Reduce data loss of asynchronous replication:

With min-rabes-max-lag this configuration ensures that if the slave replicated data and ack delay was too long, it would think that too much data was lost after the master went down and would reject the write request. In this way, the data loss caused by the failure of the master to synchronize some data to the slave can be reduced to a manageable extent.

(2) Reduce data loss of split-brain:

If a master has a split brain and loses contact with other slaves, the above two configurations ensure that if the master cannot continue to send data to a specified number of slaves and the slave does not send itself an ACK message for more than 10 seconds, the write request will be rejected.

In this way, the old master will not accept new data from the client, thus avoiding data loss.

This configuration ensures that if you lose a connection to any slave and find no ack from any slave after 10 seconds, you will reject any new write requests

So in the split-brain scenario, at most 10 seconds of data is lost.

Subjective downtime and objective downtime

  • If you think a master is down, it’s subjective.
  • Odown: Objective outage, if the sentinels of the quorum quantity feel that a master is down.

Sdown does a simple test: if a sentry pings a master for more than the number of milliseconds specified for IS-IS master-down-after-milliseconds, the master is down. If a sentinel receiving a quorum quantity within a specified period of time is also considered sdown by other sentinels, it is considered ODown.

reference

Github.com/doocs/advan…

Blog.csdn.net/weixin_4066…