1. Sdown and ODOWN conversion mechanisms

1.1 SDOWN – Subjective downtime

Sdown is a subjective outage, just a sentry if he thinks a master is down, then it’s subjective.

  • Subjective outage principle:

Sdown does this very simply: if a sentry pings a master for more than the number of milliseconds specified in is-master-down-after-milliseconds, then the master is down if there is no response.

1.2 ODOWN – The guest officer breaks down

Odown is an objective outage, and if the sentinels of the quorum number believe that a master is down, the master is down.

  • Objective outage principle:

If a sentry receiving a quorum amount within a specified period of time is considered to be sdown by other sentries, the master is considered to be ODown and the master is considered to be down.

2. Automatic discovery mechanism of the sentinel cluster

Sentinel detection is realized through redis pub/sub system. Each sentinel will send a message to the Sentinel: Hello channel. At this time, other sentinels can consume this message and sense the presence of other sentinels.

Every two seconds, each sentinel will send a message to the _sentinel_ (Topic MQ broadcast mechanism) : Hello Channel corresponding to some master+ Slaves of their health about their host, IP and RUNId and the configuration of the master

Every sentinel listens to _sentinel_: Hello Channel of every master+ Slaves they monitor and then senses the presence of other sentinels who are also monitoring this master+ Slaves.

Each sentry also exchanges master monitoring configurations with other sentries to synchronize the monitoring configurations with each other.

3. Automatically correct the slave configuration

The sentry is responsible for automatically correcting some of the configurations of the salve. If the slave is to become the master, the sentry ensures that the slave is copying the data of the new master.

If a slave is connected to the wrong master, such as after a failover, then the sentry ensures that they are connected to the correct master.

4. Slave -> Master Election algorithm

If a master is considered oDown and a majority of the sentinels allow a master/slave switchover, a majority of the sentinels will perform the master/slave switchover. Some slave information is taken into account during the election process

  1. Duration of disconnection from master
  2. Slave priority
  3. Copy the offset
  4. run id
  • When did the sentry decide that Salve was unfit to be elected master?

If a slave has been disconnected for more than 10 times as long as the master has been down, then Salve is considered unfit to be elected master.

(down-after-milliseconds * 10) + millisecond_since_master_is_in_SDOWN_state

  • Slave election as master factors to consider, i.e., how do salves rank?
  1. The slave priority is sorted according to the slave priority. The lower slave Priority is, the higher priority is
  2. If the slave priority is the same, look at replicate offset, which slave replicates more data. The further back the offset, the higher the priority.
  3. If both conditions are the same, select a slave with a smaller RUN ID.

5. Explain quorum and majority parameters

Quorum: Affects the objective outage of redis, which is considered oDOWN only when a quorum number of sentinels believe that the master is down, and then elects a sentinel to perform the master/standby switchover.

Majority: sentries who affect the voting for a majority of the sentries. The sentries elected to do the switchover must be authorized by the majority of the sentries.

If quorum is < majority, such as majority is 3, and quorum is set to 2, then three sentinel authorizations can perform a master/slave switchover.

If quorum >= majority, then all sentinels must be authorized for quorum, such as five sentinels. If quorum is 5, then all five sentinels must be authorized for a master/slave switchover.

6. configuration epoch

Sentry monitors a set of Redis master+ Slave and has corresponding monitoring configurations

If the first sentry fails to switch, then the other sentries will have a configuration epoch from the new master (slave->master). The system waits for fail-timeout and then performs the switchover. In this case, a new Configuration epoch is obtained as the new version.

Summary: The sentry performing the switchover will get a version number from the new master. If the switchover fails, it means that a new slave will be elected and a new version number will be obtained. Note: The version number is unique.

7. The configuration

After the switch is complete, the sentry updates the master configuration locally and synchronizes it to the other sentries via the pub/ SUB messaging mechanism.

The version number is also important because messages are published and listened to through a channel, so a new switch is performed by a sentinel, and the new master and other sentinels that follow the new version update their master configuration based on the size of the version number.

Summary: The IP address, port number, and RUNID of the new master are all changed after the sentry performs the master/slave switchover. The sentinel sends a message to the PUB/SUB messaging system notifying other sentinels to update the latest master configuration.

Conclusion:

image-20200519213939805

This article is formatted using MDNICE