My name is Memo. I have been reading the book Redis Design and Implementation recently. I have seen the sentry mechanism. To this end, I made small notes to consolidate knowledge.

What is sentry mechanism

Sentinel mechanism (Sentinel) is a Redis solution to high availability: it is a Sentinel system consisting of one or more Sentinel instances. As shown in the figure:

The figure above shows that sentinel system monitors the master node and slave node, and the slave node and master node have data replication function. So here’s the question:

  • How does the Sentinel system monitor master and slave nodes?
  • How does the master node and slave node replicate?
  • If the master node fails, the Sentinel system fails over?
  • How do sentinel nodes communicate in a Sentinel system?
  • How does the Sentinel system notify the client of master and slave nodes?
  • How does the Sentinel system determine if the master library is really down?
  • Which slave database does sentinel system elect as the new master database?

.

These questions will be answered one by one in the follow-up article. I hope you can read the article with questions and seek answers in the article with your own thinking. I hope I can give you some help.

Basic Flow of Sentinel

Sentinel is essentially a Redis server running in a special mode.

The sentinel server does not need to load the RDB file or AOF file to restore the database state during initialization.

The sentinel mechanism provides three functions: monitoring, failover, and notification.

  • Monitoring: The Sentinel system can monitor any number of master services and all slave servers that the master service belongs to, monitoring their health status.

  • Failover: When the Primary Redis service fails, the Sentinel system selects one of its slave servers as the master server to ensure high availability of the service.

  • Notifications: The Sentinel system notifies the client of information about the master and slave nodes to keep the client data up to date.

Next, let’s look at the surveillance. Monitoring refers to when the Sentinel process is running, it periodically checks the heartbeat to see if all the primary and secondary servers are running properly. The heartbeat detection mode is to periodically send the PING command to the primary and secondary servers. If the primary and secondary servers respond to the sentry process within the specified time, the server is judged to be alive. If the master and slave servers do not respond to the sentry process within the specified time, the Sentry process determines that the server is offline.

As shown in the figure below, if primary server Server2 does not respond to the Sentinel process within the specified time, the Sentinel process determines that primary server Server2 is subjectively offline and performs the election operation.

If the master server is offline, the Sentry process will fail over, that is, re-elect the master. Master selection is the process of electing a server from its slave servers as the new master server to provide services. After the election, the Sentry process tells all slave servers that have been taken offline to copy the new master by sending the SLAVEOF command to the slave server.

The following figure shows that during the failover operation, the Sentinel node sends the SLAVEOF command to two slave servers server3 of the offline primary server server1 to replicate the new primary server data information.

If the original primary server is restarted, it becomes the secondary server of the new primary server.

As shown in the following figure, after the old primary server server1 is restarted, it will run as the secondary server of the new primary server server2 by default.

After sentry elects a new master server, it will send the information of the new master server to the client, and let it establish a connection with the new master server, no decision logic is involved. However, during the monitoring and election process, sentry needs to make two decisions: one is to determine whether the main library is offline; The second is to choose which secondary server to serve as the new primary server during the election process.

Sentinel Obtains primary and secondary server information

By default, the Sentinel process sends the INFO command to the connected primary server through the command connection every 10 seconds. The sentinel process analyzes the data returned by the INFO command to obtain the current information of the primary server and its slave server.

As shown in the figure below, the primary server server2 and its three secondary servers Server1, server3, and server4. The sentinel process sends the INFO command to the primary server server2, which returns the corresponding primary and secondary server information.

Similarly, the sentinel process sends the INFO command to the slave server to obtain the node information corresponding to the slave server. The default frequency is 10 seconds.

Multiple Sentinels communicate

Under the Sentinel cluster, sentinel instances communicate based on the Pub /sub mechanism provided by Redis, which is the publish/subscribe model.

In the master-slave cluster, the sentinel node will not directly establish a connection with other sentinel nodes, but will first establish a connection with the master database, and then send its own information (IP+port) to a channel named “Sentinel: Hello”, other sentinel nodes subscribed to the channel will get the sentinel node information, so that the sentinel nodes know each other.

Generally speaking, in Redis Sentinel mode, sentinel nodes are connected to each other by subscribing to a designated channel rather than directly connecting to other Sentinel nodes.

For example, if sentinel1, Sentinel2, and Sentinel3 are monitoring the same server, then when Sentinel1 sends a message to the _sentinel_: Hello channel on the main server, All sentinel subscribers to the _sentinel_: Hello channel (including Sentinel itself) receive this message. As shown below:

When a sentinel receives a message from the _sentinel_: Hello channel, it analyzes the message and extracts eight parameters including sentinel IP address, Sentinel port number and Sentinel operation ID.

  • If the sentinel operation ID recorded in the message is the same as the operation ID of the sentinel receiving the message, then the message was sent by Sentinel itself and sentinel will lose the message without further processing.

  • On the contrary, if the sentinel operation ID of the message recorded is different from that of the sentinel receiving the message, it indicates that the message was sent by another Sentinel monitoring the same server. The sentinel receiving the message will update the instance structure of the corresponding master server according to the parameters in the message.

Subjective and objective logoff

Let’s first explain what subjective logoff is.

The sentinel process will use the PING command to check the network connection status of each master and slave library to determine the instance status. If the sentry detects a timeout in the master or slave library response, it considers it a “subjective offline.”

If the sentinel detects the slave and finds that the slave is not responding within a specified time, it marks it as “subjectively online” because the offline effects of the slave are generally not significant and the cluster’s external services are not interrupted. However, if you check the master library, the sentry does not simply mark it as “subjectively online” and turn on the master/slave switch.

Because there is a special possibility that the sentry misjudged. That is, the main library itself is not faulty, but because the sentry misjudged it to be offline. Once a master-slave switchover is initiated, subsequent election and notification operations incur additional computing and communication overhead. Therefore, in order to avoid unnecessary expense, we should pay strict attention to miscarriage of justice.

In a sentinel cluster, it is not up to a sentinel to determine whether a master library is offline, but a master library is marked as “objectively offline” only if a majority of the sentinels believe that the master library is “subjectively offline.” This judgment mechanism is: the minority obeys the majority. The primary/secondary switchover mode is also triggered.

For example, there are sentinel1, Sentinel2, and Sentinel3, and master1, a master repository, and Slave1, Slave2, and slave3 slave servers. But If Sentinel1 and Sentinel2 judge Master1 to be online, and Sentinel3 judge Master1 to be “subjectively offline”, then master1 is still online.

To put it simply, the criterion for “objective offline” is that when there are N instances, preferably N/2+1 sentry instances that consider them “subjective offline”, then the master library is “objective offline”. Such benefits reduce the probability of misjudgment and avoid unnecessary costs. (Of course, there are more than one instances of the “main line offline” judgment can be allowed, or the Redis administrator can set)

Vote for a new library

How does sentry elect a new master library when the sentry starts a master/slave switch? What mechanism does it follow?

In general, I summarize the sentry’s process of electing a new master as “screening + sorting”. First of all, the sentinel will filter out the unqualified slave libraries according to a certain screening mechanism, and then sort the qualified slave libraries, so as to produce a new library.

I called the sentry election process “filter + sort” above, so let’s talk about the filter mechanism and the sorting mechanism.

Let’s start with the screening mechanism.

  • Screening out all slave servers that are offline or disconnected ensures that the remaining slave servers are online.

  • Filtering out all slave servers that did not respond to the Sentinel’s INFO command within the specified time ensures that the remaining slave servers have recently communicated successfully.

  • We screen out all the slave servers that disconnected from the offline primary server for more than down-after-milliseconds*10 milliseconds. This ensures that none of the remaining slave servers disconnected from the offline primary server prematurely. In other words, all the slave servers in the list are new.

The above is the filtering mechanism, followed by the sorting mechanism.

  • Sentry sorts the remaining slave servers in the list according to their priority and selects the one with the highest priority.

  • If there are multiple slave servers with the same best priority, the sentry sorts all slave servers with the same priority by replication offset and selects the slave with the highest offset.

  • If there are multiple slave servers with the highest priority and the highest replication offsets, sentry sorts these slave servers by their run IDS and selects the slave with the smallest run ID.

At this point, the new master library is elected, and the “election” process is complete. Let’s review the election process. First, sentry will filter the slave servers that are offline, offline, or in bad network state. Secondly, it will sort the slave servers according to their priority, replication offset, and running ID, and finally get a slave server, which will be the new master server.

Client event notification based on pub/sub mechanism

In essence, Sentinel is a Redis running in a specific mode, except that it does not serve for request operations, but only for monitoring, failover, and notification. Each sentinel provides a pub/sub mechanism from which clients can subscribe to messages.

The client can subscribe to all events from The Sentinel, so that the client can not only get the connection information of the new master library after the master/slave switchover, but also monitor the important events that occur during the master/slave switchover.

With pub/sub mechanism, sentry and sentry, sentry and slave library, sentry and client can be connected, coupled with the above will be the master library judgment basis and election basis, sentry cluster monitoring, election, notification three tasks can be normal operation.

conclusion

  • Sentinel is just a Redis running in a special environment and does not provide data storage service.

  • Sentinel sends the INFO command to the master server to obtain the address information of the slave servers that the master belongs to, and creates the corresponding instance structure for these slave servers, as well as sending command connections and subscription connections to these slave servers.

  • Sentinel typically sends INFO commands to the monitored master and slave libraries every 10 seconds to obtain information about the master and slave libraries. When the primary repository is offline or sentinel is failing over the primary server, the frequency of sentinel sending INFO commands to the slave service is changed to once per second.

  • For sentinels monitoring the same master server, they inform other sentinels of their presence by sending messages to _sentinel_:hello on the master server. Other Sentinels that subscribe to the channel will receive it, and each sentinel will know each other.

  • Sentinel only establishes command connections and subscription connections with the master and slave servers, while Sentinel only establishes command connections and communicates with each other.

  • Sentinel sends a PING command to an instance (slave server, master server, other Sentinels) once per second and determines whether the instance is online based on its response to the PING command. If an instance does not respond to the PING command within a specified period of time, it is considered as subjective offline.

  • Under the Sentinel cluster, when sentinel receives enough subjective offline votes, it determines that the primary server is objectively offline and initiates a failover operation against the primary server.

That’s the end of Redis Sentry, see you next time!!

My name is Memo, a primary student in IT field.