Abstract: Based on the understanding of Redis Sentinel source code, this paper explains the code implementation of Sentinel in detail.

Redis Sentinel is a highly available model solution provided by Redis. Sentinel can automatically monitor one or more active/standby instances of Redis and perform an automatic active/standby switchover in the event of a primary instance failure. Through the understanding of Redis Sentinel source code, this paper explains the code implementation of Sentinel in detail.

Sentinel uses the same event-driven code framework as the Redis kernel, but Sentinel has its own unique initialization steps. In this article, we will explain the initialization of Sentinel, Sentinel main time event function, Sentinel network connection and Tilt mode.

Sentinel initialization

We can start and run sentinel instances using either Redis-sentinel or Redis-server –sentinel, which are equivalent. In the main function of Redis server.c, we can see how Redis determines the logic that the user specified to run in Sentinel mode:

The checkForSentinelMode function monitors the following two conditions:

1. The program is executed using the Redis-Sentinel executable file.

2. There is sentinel flag in the program parameter list.

If any of these conditions are true, Redis will run Sentinel.

After Redis determines whether to run Sentinel, we see the following code snippet:

In the initSentinelConfig function, the Sentinel specific port (26379 by default) is used instead of the Redis default port (6379). In addition, in Sentinel mode, you need to disable the server to run in protected mode.

At the same time, the initSentinel function does the following:

1. Use Sentinel’s built-in command table to replace Redis server’s native commands. The commands supported by Sentinel are as follows:

2. Initialize the Sentinel master state structure. The definition and annotation of Sentinel master state are as follows.

Where each value in the Masters dictionary pointer corresponds to a master instance of Sentinel detection.

After reading the configuration information, the Redis server main function calls the sentinelIsRunning function to do the following:

1. Check that the configuration file is set and that the application has write permission to the configuration file, because if the Sentinel status changes, it will continuously record its current status in the configuration file.

2. If a run ID is specified in the configuration file, Sentinel uses this ID as the run ID. Conversely, if no run ID is specified, Sentinel generates an ID to be used as the run ID of Sentinel.

3. Generate initial monitoring events for all Sentinel monitoring instances.

Primary time event function for Sentinel

Sentinel uses the same event handling mechanism as the Redis server: file events and time events. The file event processing mechanism uses I/O multiplexing to handle server-side network I/O requests, such as client connections, reads and writes. The time processing mechanism calls the time function periodically in the main loop to handle scheduled operations, such as server-side maintenance, scheduled updates, and deletes. The Redis server primary time function is the serverCron function defined in server.c. By default, serverCron is called every 100ms. In this function, we see the following code:

When the server is running in Sentinel mode, serverCron calls the sentinelTimer function to run the main logic in Sentinel. The sentinelTimer function is defined in sentinel.c as follows:

The Sentinel Timer function does the following:

1. Check whether Sentinel is currently in Tilt mode (Tilt mode is described in a later section).

2. Check the connection between Sentinel and its monitored active and standby instances, as well as other Sentinel instances, update the current status, and automatically switch over the active and standby instances when the active instance goes offline.

3. Check the status of the callback script and perform related operations.

4. Update the server frequency (the frequency of calling the serverCron function), plus a random factor, to prevent the time conflict when Sentinel monitors the same primary node elects the Leader, resulting in the election cannot produce an absolute number of votes.

Which SentinelHandleDictOfRedisInstances function are defined as follows:

SentinelHandleDictOfRedisInstances function main work is:

1. Call sentinelHandleDictOfRedisInstance function processing Sentinel connected with other specific instances, more new state, as well as the main case of rearrangement of work.

1. If the current practical instance is given priority to, recursive calls SentinelHandleDictOfRedisInstances function from instance dealing with their subordinates and other monitoring Sentinel of the primary instance.

2. If the active/standby switchover is successful, update the active instance to the secondary instance of the active instance.

Among them in sentinelHandleRedisInstance are defined as follows:

This function does two things:

1. Check the connection between Sentinel and other instances (primary and secondary instances and other Sentinels). If the connection is not set or disconnected, Sentinel will retry the corresponding connection and periodically send a response command. Note that Sentinel**** and each primary/secondary instance have two connections, a command connection and a publish-subscribe connection. However, Sentinel, which listens for the same primary and secondary instances as other sentinels, only retains command connections, which are covered separately in the networking section.

2. The operation of the second part is mainly to monitor the primary and secondary Sentinel instances and other Sentinel instances, and to monitor whether they are in the subjective offline state. For the primary instance, it is also to detect whether they are in the objective offline state, and to carry out the corresponding master and standby switchover operation.

If Sentinel is ignored in Tilt mode, let’s look at the implementation details of the second part of this function.

SentinelCheckSubjectivelyDown function will monitor specific Redis instance (main for instance, as well as other Sentinel) whether in the subjective offline state, this part of the function code is as follows:

Subjective offline status means that a particular Redis instance meets one of the following conditions:

1. No response was received for down_after_milliseconds specified by the instance.

2. Sentinel considers the instance to be the primary instance, but receives a reply from the instance as the secondary instance, and the last instance role reply time is greater than the down_after_millisecon time configured on the instance plus 2 times the INFO interval.

If any of these conditions are met, Sentinel turns on the S_DOWN flag of the instance and considers it to be in a subjective offline state.

Subjective offline status means that Sentinel subjectively considers the instance offline, but at this time Sentinel does not ask other Sentinels monitoring the instance about its online status.

SentinelCheckObjectivelyDown function will check if the instance as objective offline, this operation was carried out on the primary instance only. SentinelCheckObjectivelyDown function are defined as follows:

This function loops to see if the other Sentinel SRI_MASTER_DOWN flags that monitor the primary instance are turned on. If turned on, it means that other specific Sentinels consider the primary instance to be offline, and counts the number of votes that consider the primary instance to be offline. If the number of votes is greater than or equal to the quorum configured for the primary instance, Sentinel turns on the SRI_O_DOWN flag for the primary instance and considers the primary instance to be objectively offline.

SentinelStartFailoverIfNeeded function first checks if the instance is in the objective offline (SRI_O_DOWN mark whether open), and in case of rearrangement of 2 times the primary instance configuration Lord no to the Lord for rearrangement of work overtime, Sentinel turns on the SRI_FAILOVER_IN_PROGRESS flag and sets the failover state to SENTINEL_FAILOVER_STATE_WAIT_START. And start the master/slave switchover. The details of the active/standby switchover are described in the active/standby switchover section.

Sentinel’s network connection

As mentioned above, each Sentinel instance maintains a Command Connection and a Pub/Sub Connection with the monitored master/slave instances. However, it is important to note that there is only one command connection between Sentinel and other sentinels. The functions of command and publish and subscribe connections are described below.

Command connection

Sentinel maintenance commands are connected to communicate with other master-slave instances and Sentinel instances by sending and receiving commands, for example:

1. By default, Sentinel sends the PING command to other instances every 1s to determine whether other instances are offline.

2. Sentinel sends the INFO command to the primary and secondary instances every 10s through the command connection between Sentinel and the primary instance to obtain the latest information about the primary and secondary instances.

3. In the case that the master instance is offline, Sentinel sends the SLAVEOF NO ONE command to the selected slave instance through the command connection between Sentinel and slave instance to promote the slave instance to the new master node.

4. By default, Sentinel sends is-master-down-by-addr every 1s to ask other Sentinel nodes whether the monitored primary node is offline.

In the sentinel. In c sentinelReconnectInstance function, command connection initialization is as follows:

Publish subscribe connection

The purpose of Sentinel maintenance and the publish-subscribe connection of other master-slave nodes is to learn about the existence of other Sentinel instances monitoring the same master-slave instances, and to update the cognition of the monitored master-slave instances and Sentinel instances sent from other Sentinel instances. For example, after the master/slave switchover is completed, other Sentinels update the relevant information (address, port number, etc.) of the new master node by reading the channel message of the leading Sentinel.

Sentinel by default sends Hello message packets to its corresponding master/slave instance’s __sentinel__: Hello channel every two seconds. The Hello message format is as follows:

__sentinel_:hello <sentinel address > <sentinel port number > <sentinel running ID > <sentinel configuration era >

When Sentinel receives Hello packets sent by other Sentinels through subscription connection, it will update the cognition of the master and slave nodes and S sending Sentinel. If it receives its own Hello packets, it will simply discard them without any processing. This part of the code logic is in the sentinel in c sentinelProcessHelloMessage function defined, because of the space here is not to do detailed introduction.

In the sentinel. In c sentinelReconnectInstance function, publish-subscribe connection initialization is as follows:

Is the master – down – by – addr command

By default, Sentinel sends is-master-down-by-addr every 1s through a command connection to ask other Sentinel nodes whether the monitored primary node is offline. In addition, when the primary instance is offline, the sentinels also vote and elect the lead Sentinel through is-master-down-by-addr. Is-is master-down-by-addr is in the following format:

Is-master-down-by-addr: < primary instance address > < primary instance port number > < current configuration era > < Running ID>

The entry is always * if not in the election lead Sentinel process, and conversely, the entry is its own run ID if Sentinel sends voting requests to other Sentinels. This part of the code is as follows:

The reply format of the is-master-down-by-addr command is as follows:

  • < Active node Offline Status >
  • < Lead Sentinel Operation ID >
  • Lead Sentinel Configuration era

After receiving replies from other Sentinel commands, Sentinel will record the online status information of the master instances of other Sentinel replies, as well as the voting situation in the process of electing the lead Sentinel. This part of the code logic defined in sentinel. SentinelReceiveIsMasterDownByReply function in c:

Tilt mode

Tilt mode for Sentinel is enabled in one of the following situations:

1. The Sentinel process is blocked beyond the SENTINEL_TILT_TRIGGER time (2s by default), possibly because of excessive process or system I/O (memory, network, storage) requests.

2. Adjust the system clock to a previous time.

Tilt mode is a protection mechanism. In Tilt mode, Sentinel does not perform other operations, such as active/standby switchover, subjective and objective offline, except for sending necessary PING and INFO commands. Tilt mode can be considered Sentinel’s passive mode until the SENTINEL_TILT_PERIOD (default: 30s) is over, as the INFO command and the HELLO message package of the subscribed connection can be used to retrieve information and update its structure.

The code logic to determine the Tilt mode is defined as follows:

References:

U github.com/antirez/red…

U redis. IO/switchable viewer/sent…

Redis design and Implementation 2nd edition by Huang Jianhong

This article is shared from Redis Sentinel source code analysis in Huawei cloud community.

Click to follow, the first time to learn about Huawei cloud fresh technology ~