An overview,

This article is mainly used to describe the implementation principle of REDIS high availability, learning redis high availability implementation, can let us better system design. You can also learn from the redis design and migrate it to other system implementations.

Ii. Overall structure

  • Monitoring :Sentinel node will periodically check whether Redis data node (master/slave) and other Sentinel nodes are reachable.
  • Primary node failover: If the primary node is unavailable, the secondary node is promoted to the primary node and the subsequent correct master/slave relationship is maintained.
  • Notification: The Sentinel node notifies the application of the result of the failover.
  • Configuration provider: In the Redis Sentinel architecture, the client connects to a Sentinel node collection during initialization to obtain master node information

Three, the implementation principle

Node communication (communicate with each other to realize functions)

1. Every 10s, sentinel sends info to all data nodes.

  • The info command is used to obtain information about the secondary node
  • New nodes can be immediately sensed and added to the cluster

2. Every 2s, master node judgments and current sentinel node information are sent to the _sentinel_: Hello channel.

  • Other Sentinel node information was found. Procedure
  • Switch the status of the primary node. Do some logical processing (such as failover if the master of another node is new).
<Sentinel node IP> <Sentinel node port > <Sentinel node runId> <Sentinel node configuration version > < primary node name > < primary node IP> < primary node port > < primary node configuration version >Copy the code

3. Ping (Alive check) is sent to other nodes every 1s.

  • Check whether the node is alive

Through 1 and 2, when sentinel starts, only the master to be monitored needs to be configured to realize the automatic discovery mechanism for other slaves and Sentinels. This is done by publishing subscriptions and info operations to the Redis Data node.

Subjective offline

A node is offline (the ping fails during the timeout period)

Objective offline

If the primary node is subjectively offline, the sentinel is-master-down-by-addr command is used to ask other Sentinel nodes for their judgment on the primary node. Overquorum, there will be an objective roll-out.

Is the master – down – by – addr command

All Sentinel nodes communicate via this command in addition to timing pub Hello messages. Keep other Sentinel information informed. The function of this command is as follows:

  • Other sentinels were asked to determine the primary node
  • Lord’s vote
sentinel is-master-down-by-addr <ip> <port> <current_epoch> <runid>
Copy the code

If runid is *, the primary node goes offline

If no voting order is requested for *

The primary node goes offline

Sentinel IS-master-down-by-addr 127.0.0.1 6379 0 *Copy the code

The leadership election vote uses Raft’s Lord Election algorithm to vote through this command

Command return result

  • down_state
  • Leader_runid * delegate used as the master node is unreachable, and the specific RUNId delegate agrees that rUNId becomes the leader.
  • Leader_epoch Leader term

failover

Primary node is faulty. Secondary node is elected. 1.

(1) Filter unhealthy

(2) Select the list of slave nodes with the highest slave-priority

(3) Select the one with the largest replication offset.

(4) Select the one with the smallest RUNID

2. Run Slaveof no one

3. Send commands to the remaining slave nodes to become slave nodes of the new master

4. Update the original master to slave. And stay focused

4. Source code analysis

Sentienl storage structure.

Each pink box in the figure represents a SentinelRedisInstance, a unified data structure in Sentienl, which is simply a chunk of memory that stores the node’s information. Every Redis Data node or Sentinel node has a SentinelRedisInstance (except for the current sentienl). It’s like every network connection has a socket.

Master Indicates the monitored master information. Each sentinel has multiple master nodes (only one is drawn above), which means that multiple master nodes can be monitored simultaneously.

The SentinelRedisInstance structure has two hashes: key for name and val for SentinelRedisInstance.

One is used to store the slave of the current master and the other is used to store other sentinel nodes that also monitor the master (as shown in the above structure). Hash is used to quickly find the corresponding structure by key.

// All sentinels that also monitor this master server
dict *sentinels; /* Other sentinels monitoring the same master. */
// If this instance represents a primary server
// Then this dictionary holds the slave servers under the master server
// The key of the dictionary is the name of the server, and the value of the dictionary is the sentinelRedisInstance structure corresponding to the server
dict *slaves; /* Slaves for this master instance. */
Copy the code

Of course, these hashes are only used when SentinelRedisInstance is master. So all of sentinel’s operations revolve around this master structure. Sentinel uses the info command and publish Hello message to discover other slaves and sentinels. The new nodes are added to the corresponding hash.

flags

The only thing that needs to be emphasized is the Flags variable in SentinelRedisInstance. This variable records the state of the current node. (Flags uses flag bits to record multiple states, one bit for each state, updated by | ~)

  • If it is master, the master status is recorded.
  • If it is sentinel, record some judgment status of the current sentienl.

So Sentinel maintains the logic of the entire cluster entirely around these sentinelredisinstances. Ping info, for example, traverses the SentinelRedisInstance, ping its hostport, info, etc.

Iterate over the logic: First get the Master SentinelRedisInstance, execute the logic, and then iterate over the two hash processes.

The service start

Sentinel mode starts with the following logic

if (server.sentinel_mode) {
    initSentinelConfig();
    initSentinel();
}
Copy the code

InitSentinel is mainly initialization, including sentinel structure creation, initialization of some global variables, and registration of command handlers.

sentinelHandleConfiguration

The service starts by loading the configuration first. This approach is primarily the parsing logic of sentinel configurations.

if(! strcasecmp(argv[0]."monitor") && argc == 5) {
    /* monitor <name> <host> <port> <quorum> */
    // Read in the quorum parameter
    int quorum = atoi(argv[4]);
    // Check that the quorum parameter must be greater than 0
    if (quorum <= 0) return "Quorum must be 1 or greater.";

    // Create the primary server instance
    if (createSentinelRedisInstance(argv[1],SRI_MASTER,argv[2],
                                    atoi(argv[3]),quorum,NULL) = =NULL)
    {
        switch(errno) {
        case EBUSY: return "Duplicated master name.";
        case ENOENT: return "Can't resolve master instance hostname.";
        case EINVAL: return "Invalid port number"; }}}else if(! strcasecmp(argv[0]."down-after-milliseconds") && argc == 3) {

    /* down-after-milliseconds <name> <milliseconds> */
    // Find the primary server
    ri = sentinelGetMasterByName(argv[1]);
    if(! ri)return "No such master with specified name.";

    // Set the options
    ri->down_after_period = atoi(argv[2]);
    if (ri->down_after_period <= 0)
        return "negative or zero time parameter.";
    sentinelPropagateDownAfterPeriod(ri);
}
Copy the code

The core is the creation of master’s SentinelRedisInstance structure. The rest of the configuration is then initialized into the structure. Only part of the code is listed above. The other configurations are similar to the parsing logic for down-after-milliseconds.

createSentinelRedisInstance

Used to create instances in Sentinel. Including Sentinel, master and slave of monitored Redis. This method is called to create an instance every time a new one is created and added to the Master’s SentinelRedisInstance (except for the master itself)

  • SRI_MASTER Creates a master instance to be monitored. Add it to the sentinel.masters hash table
  • SRI_SLAVE Creates a slave instance to be monitored. Add it to master-> Slaves hash table
  • SRI_SENTINEL Creates a monitored sentinel instance. Add it to the Master -> Sentinels hash

So that’s basically some initialization. The Sentinel core logic is driven by a time function, so let’s look directly at the sentinelTimer logic.

sentinelTimer

This function is executed once in 100ms.

void sentinelTimer(void) {
    // Record the event of this sentinel call to determine whether it is necessary to enter TITL mode
    sentinelCheckTiltCondition();
    // Perform periodic operations
    // For example, PING the instance, analyze the master and slave server INFO commands
    // Send a greeting message to other Sentinels monitoring the same master server
    // Receive greetings from other Sentinels
    // Perform failover operations, etc
    sentinelHandleDictOfRedisInstances(sentinel.masters);

    // Run the script waiting to be executed
    sentinelRunPendingScripts();

    // Clean up the scripts that have been executed and retry the scripts that failed
    sentinelCollectTerminatedScripts();

    // Kill scripts that run timeout
    sentinelKillTimedoutScripts();
    /* We continuously change the frequency of the Redis "timer interrupt" * in order to desynchronize every Sentinel from every other. * This non-determinism avoids that Sentinels started at the same time * exactly continue to stay synchronized asking to be voted at the * same time again and again (resulting in nobody likely winning the * election because of split brain voting). */
    server.hz = REDIS_DEFAULT_HZ + rand() % REDIS_DEFAULT_HZ;
}
Copy the code

SentinelCheckTiltCondition method is used to determine whether enter the TTL, and record the execution time.

TITL mode: Because Sentinel relies on the native time driver, the time function is called late if there is a problem with the system time or because the process is blocked. Participating in the cluster logic at this point will make an incorrect decision. So if the difference between the current time and the last execution time is negative or exceeds 2s, the node goes into TILT mode.

sentinelHandleDictOfRedisInstances

void sentinelHandleDictOfRedisInstances(dict *instances) {
    dictIterator *di;
    dictEntry *de;
    sentinelRedisInstance *switch_to_promoted = NULL;

    /* There are a number of things we need to perform against every master. */
    // Iterate over multiple instances, which can be multiple master servers, multiple slave servers, or multiple Sentinels
    di = dictGetIterator(instances);
    while((de = dictNext(di)) ! =NULL) {

        sentinelRedisInstance *ri = dictGetVal(de);
        // Perform the scheduling operation
        sentinelHandleRedisInstance(ri);

        // If the primary server is traversed, all secondary servers of the primary server are recursively traversed
        if (ri->flags & SRI_MASTER) {

            // All slave servers
            sentinelHandleDictOfRedisInstances(ri->slaves);
            / / all the sentinel
            sentinelHandleDictOfRedisInstances(ri->sentinels);

            if (ri->failover_state == SENTINEL_FAILOVER_STATE_UPDATE_CONFIG) {
                // A new master server has been selectedswitch_to_promoted = ri; }}}// Remove the original master (offline) from the master table and replace it with a new master
    if (switch_to_promoted)
        sentinelFailoverSwitchToPromotedSlave(switch_to_promoted);
    dictReleaseIterator(di);
}
Copy the code

This method recursively traverses all SentinelRedisinstances. The core is to perform periodic operations on all instances monitored by Sentinel, including master, slave, and other Sentinels.

The logic is simple, in fact, the execution sentinelHandleRedisInstance method.

If there is a failover (upgrade mainly from node), call the sentinelFailoverSwitchToPromotedSlave replace new primary service.

SentinelHandleRedisInstance method

The logic of each method is described in detail below. This method is called periodically, non-blocking (intermediate IO commands are asynchronously sent out).

void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
    /* Every kind of instance */
    // Create a network connection to the instance if necessary
    sentinelReconnectInstance(ri);

    // PING, INFO, or PUBLISH to the instance, depending on the situation
    sentinelSendPeriodicCommands(ri);
    /* ============== ACTING HALF ============= */
    /* We don't proceed with the acting half if we are in TILT mode. * TILT happens when we find something odd with the time, like a * sudden change in the clock. */
    if (sentinel.tilt) {
        // If TILI mode is not broken, no action is performed
        if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return;

        // Time expired. Exit TILT mode
        sentinel.tilt = 0;
        sentinelEvent(REDIS_WARNING,"-tilt".NULL."#tilt mode exited");
    }
    /* Every kind of instance */
    // Check whether the given instance is in SDOWN state
    sentinelCheckSubjectivelyDown(ri);

    /* Masters and slaves */
    if (ri->flags & (SRI_MASTER|SRI_SLAVE)) {
        /* Nothing so far. */
    }
    /* Only masters */
    /* To process the primary server */
    if (ri->flags & SRI_MASTER) {

        // Check whether the master enters the ODOWN state
        sentinelCheckObjectivelyDown(ri);
        // If the primary server enters the ODOWN state, start a failover operation
        if (sentinelStartFailoverIfNeeded(ri))
            // Force the Sentinel is-master-down-by-addr command to be sent to other sentinels
            // Refresh the status of other Sentinels about the master server
            sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
        // Perform failover
        sentinelFailoverStateMachine(ri);
        // If necessary, send the Sentinel IS-master-down-by-addr command to other sentinelssentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS); }}Copy the code

1. SentinelReconnectInstance mainly is to create links and instance. If it is a master or slave instance, it subscribs to the node’s __sentinel__: Hello channel to receive messages broadcast by other Sentinels. Each Sentinel periodically publishes the channel. This is because the instance data is initialized and the connection is not created. For each instance, this is where the connection is created.

2. SentinelSendPeriodicCommands mainly send ping, info, the publish command.

3. If the vm is in tilt mode, you can directly return to the vm. Subsequent operations are performed only in tilt mode.

4. SentinelCheckSubjectivelyDown, check whether a given instance into SDOWN state

5. If the current instance is the primary server, perform fault diagnosis and failover operations.

You can see that all of sentienl’s logic revolves around a master service.

SentinelSendPeriodicCommands method

This is our core focus, the communication logic between Sentinels.

This method consists of ping, info, and publish Hello messages. The corresponding logic is executed when the execution time is reached.

  • Ping is performed once every s. If the value of down_after_period is smaller than the value of down_after_period, the value is down_after_period.
  • Info The command is executed every 10 seconds. If the command is not executed once, it is executed directly. If the primary node SDOWN or SRI_FAILOVER_IN_PROGRESS (failover) is performed every second to catch server changes faster.
  • Publish is performed every 2s. (used to discover and communicate with other Sentinels)

Redis records the commands that are currently being executed asynchronously through pending_commands. If the number exceeds 100, the command is not sent to avoid command accumulation. Because Sentinel relies on the last response time to determine whether to send a command, if there is network congestion or fluctuations, it can cause frequent sending.

The Ping command

Send the logic in sentinelSendPeriodicCommands approach.

SentinelPingReplyCallback: Ping command callback method, if the callback is normal, update the corresponding field (last_avail_time, last_pong_time), sentienl rely on these fields are condemned to live. If the node executes lua timeout, SCRIPT KILL is called to try to KILL the SCRIPT.

The info command

Info Command to obtain information about the redis node. SentinelRefreshInstanceInfo method to process the return results

SentinelRefreshInstanceInfo method mainly parse and extract the needed data information. Here are some core information.

1. If it is a master node, the hostport of its slave node is extracted and instance information (if newly discovered) is created for the slave node. Sentinel relies on this method to discover other slave nodes.

if (sentinelRedisInstanceLookupSlave(ri,ip,atoi(port)) == NULL) {
    if ((slave = createSentinelRedisInstance(NULL,SRI_SLAVE,ip, atoi(port), ri->quorum, ri)) ! =NULL)
    {
        sentinelEvent(REDIS_NOTICE,"+slave",slave,"% @"); }}Copy the code

2. If the secondary node is used, record the information about the primary node of the current secondary server. Because the primary node of sentinel’s records may not be correct (network partitioning causes switching delays), the latest information about this secondary server is obtained through INFO. For later logic.

There are actually two master (slave) node states: one is when sentienl thinks the current node is the master (slave) node, and the other is when sentienl thinks the current node is the master (slave) node.

3. If a role change has occurred (the node status returned by INFO is inconsistent with the current Sentinel record), update the role change time. If tilt, return directly. Otherwise, some logic is performed according to the role returned by the node and the role of sentinel records, which will be discussed later.

The Publish Hello message

The information sent to the corresponding channel is as follows, including the corresponding sentinel information and the current master information

snprintf(payload,sizeof(payload),
    "%s,%d,%s,%llu," /* Info about this sentinel. */
    "%s,%s,%d,%llu"./* Info about current master. */
    ip, server.port, server.runid,
    (unsigned long long) sentinel.current_epoch,
    /* --- */
    master->name,master_addr->ip,master_addr->port,
    (unsigned long long) master->config_epoch);
Copy the code
Subscribe to handle the hello message (sentinelProcessHelloMessage)

This method is how Sentinel handles the Hello message.

1. If the sentinel node in the Hello message is a newly discovered node (which does not exist in the current Setinel), create an instance for the new node and add it to the list. That’s the only way Sentinel can find other Sentinels.

2. update current_epoch= Max (current current_epoch, hello’s current_epoch)

3. If the hello message master_config_EPOCH is larger than the master config_epoch of the node. Call the sentinelResetMasterAndChangeAddress method to switch the current master. Config_epoch is the epoch used for failover. Incrementing after failover. If a large number is found, failover is performed and the master in Hello is trusted as the latest master

if (master->config_epoch < master_config_epoch) {
    master->config_epoch = master_config_epoch;
    if(master_port ! = master->addr->port ||strcmp(master->addr->ip, token[5]))
    {
        sentinelAddr *old_addr;

        sentinelEvent(REDIS_WARNING,"+config-update-from",si,"% @");
        sentinelEvent(REDIS_WARNING,"+switch-master",
            master,"%s %s %d %s %d",
            master->name,
            master->addr->ip, master->addr->port,
            token[5], master_port);

        old_addr = dupSentinelAddr(master->addr);
        sentinelResetMasterAndChangeAddress(master, token[5], master_port);
        sentinelCallClientReconfScript(master,
            SENTINEL_OBSERVER,"start", old_addr,master->addr); releaseSentinelAddr(old_addr); }}Copy the code
SentinelCheckSubjectivelyDown method

Check whether the instance enters the SDOWN state

1. If the instance meets the conditions for disconnection and reconnection, disconnect the instance and wait for the next connection. It is essentially disconnecting and reconnecting inactive instances.

2. Set or cancel the SDOWN flag

If the following two conditions are met, the SDOWN flag is cancelled

  • If the command is not replied within the timeout period, SDOWN is set to SDOWN
  • Sentinel considers the instance to be the master server, which reports to Sentinel that it will become the slave server, but the server still hasn’t completed this role transition after the given time limit

Master node separate logic

if (ri->flags & SRI_MASTER) {
    // Check whether the master enters the ODOWN state
    sentinelCheckObjectivelyDown(ri);
    // If the primary server enters the ODOWN state, start a failover operation
    if (sentinelStartFailoverIfNeeded(ri))
        // Force the Sentinel is-master-down-by-addr command to be sent to other sentinels
        // Refresh the status of other Sentinels about the master server
        sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);

    // Perform failover
    sentinelFailoverStateMachine(ri);
    // If necessary, send the Sentinel IS-master-down-by-addr command to other sentinels
    // Refresh the status of other Sentinels about the master server
    / / it is for those who did not enter the if (sentinelStartFailoverIfNeeded (ri)) {/ *... * /}
    // used by the statement's primary server
    sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
}
Copy the code

sentinelCheckObjectivelyDown

Check whether the current primary node enters the ODown state.

By traversing the flags bits of all Sentinel instances. If more than half of them are subjective, they will be objective. This status bit is updated in the is-master-down-by-addr command callback.

sentinelStartFailoverIfNeeded

Determine whether failover is required

void sentinelStartFailover(sentinelRedisInstance *master) {
    redisAssert(master->flags & SRI_MASTER);

    // Update failover status
    master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;

    // Update the master server status
    master->flags |= SRI_FAILOVER_IN_PROGRESS;

    // A new era
    master->failover_epoch = ++sentinel.current_epoch;

    sentinelEvent(REDIS_WARNING,"+new-epoch",master,"%llu",
        (unsigned long long) sentinel.current_epoch);

    sentinelEvent(REDIS_WARNING,"+try-failover",master,"% @");

    // Record the change time of the failover state
    master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
    master->failover_state_change_time = mstime();
}
Copy the code

Judgment conditional failover:

1. Enter ODOWN and not in failover.

2. If failover is too frequent, it is not performed.

If failover is required, update the current master information, including fields failover_state and failover_EPOCH.

Failover_epoch is the current master epoch +1.

Failover_epoch role:

  • Other Sentinels rely on this field to determine whether failover is required. We talked about that earlier in Hello.
  • Sentinel currently relies on this era to select leaders to perform failover. Because that’s the era used in the election. The era of the elected leader should be consistent. Each election creates a new leader, and the latest era is the most authoritative, which is the core concept behind Raft’s leadership election. Although Raft uses term.

Failover_state Indicates the current failover status. Failover operations depend on this state.

A failover operation needs to elect an execution from Sentienl. So this is just a status update.

sentinelAskMasterStateToOtherSentinels

Ask the other Sentine for the master state.

This will loop through all sentinels, and if the current sentinel considers the master offline and the connection is normal, the is-master-down-by-addr command will be sent

sentinel is-master-down-by-addr <ip> <port> <current_epoch> <runid>
Copy the code

If the Sentinel detects that the master is subjectively offline (as determined by failover_state), then the rUNID is the rUNID of the current server, which means that the other sentienL will vote for it.

If the master is offline, runid=* tells the other sentienl that the master is offline. This is the only way sentinel notifies other Sentinel primary nodes that they are offline.

Is-is master-down-by-addr Command processing

This logic is executed in sentinelCommand

ri = getSentinelRedisInstanceByAddrAndRunID(sentinel.masters,
    c->argv[2]->ptr,port,NULL);

if(! sentinel.tilt && ri && (ri->flags & SRI_S_DOWN) && (ri->flags & SRI_MASTER)) isdown =1;

/* Vote for the master (or fetch the previous vote) if the request * includes a runid, otherwise the sender is not seeking for a vote. */
if (ri && ri->flags & SRI_MASTER && strcasecmp(c->argv[5]->ptr,"*")) {
    leader = sentinelVoteLeader(ri,(uint64_t)req_epoch,
                                    c->argv[5]->ptr,
                                    &leader_epoch);
}

/* Reply with a three-elements multi-bulk reply: * down state, leader, vote epoch. */
// Multiple replies
// 1) 
      
        1 indicates offline, and 0 indicates offline
      
// 2) 
      
        Sentinel election as the running ID of the lead Sentinel
      
// 3) < Leader_EPOCH > Lead the current configuration epoch of Sentinel
addReplyMultiBulkLen(c,3);
addReply(c, isdown ? shared.cone : shared.czero);
addReplyBulkCString(c, leader ? leader : "*");
addReplyLongLong(c, (long long)leader_epoch);
Copy the code

Returns down_state1 if the node goes offline, 0 otherwise

Get RI (sentinelRedisInstance) based on IP and port. If RI is the primary node and RUNId is not *, the election is voted.

Voting logic:

Requested REq_EPOCH > current_epoch for the current sentinel (updating the current_epoch for sentinel)

Req_epoch > Master master.leader_EPOCH, and >=sentinel.current_epoch, update master’s leader to req_runid. The current REq_rUNId node is then voted.

It’s really judging the epoch. If the requested EPOCH is large, vote. Same as raft’s leadership election.

Is the master – down – by – addr command callback (sentinelReceiveIsMasterDownReply)

// Update the last time you responded to an inquiry
ri->last_master_down_reply_time = mstime();

// Set the status of the primary server as considered by SENTINEL
if (r->element[0]->integer == 1) {
    / / have rolled off the production line
    ri->flags |= SRI_MASTER_DOWN;
} else {
    / / not offline
    ri->flags &= ~SRI_MASTER_DOWN;
}
// If the run ID is not "*", then this is a reply with a vote
if (strcmp(r->element[1]->str,"*")) {
    /* If the runid in the reply is not "*" the Sentinel actually * replied with a vote. */
    sdsfree(ri->leader);
    // Prints logs
    if(ri->leader_epoch ! = r->element[2]->integer)
        redisLog(REDIS_WARNING,
            "%s voted for %s %llu", ri->name,
            r->element[1]->str,
            (unsigned long long) r->element[2]->integer);
    // Set the lead for the instance
    ri->leader = sdsnew(r->element[1]->str);
    ri->leader_epoch = r->element[2]->integer;
}
Copy the code

  • 1 indicates offline, and 0 indicates offline

  • Sentinel election as the run ID of the lead Sentinel
  • < leader_EPOCH > Leads the current configuration epoch for Sentinel

1. Update the response time

2. Update the flag of the sentinel queried. Used for ODOWN judgment.

3. If leader_runid is not *, the sentinel leader information is updated.

The LEADER field of an RI has two states:

  • If RI is a master server instance, then the Leader will be the run ID of Sentinel responsible for failover.
  • If RI has a Sentinel instance, then the leader is elected as the lead Sentinel.

Failover (sentinelFailoverStateMachine)

switch(ri->failover_state) {

    // Wait for failover to begin
    case SENTINEL_FAILOVER_STATE_WAIT_START:
        sentinelFailoverWaitStart(ri);
        break;

    // Select a new master server
    case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
        sentinelFailoverSelectSlave(ri);
        break;
    
    // Upgrade the selected secondary server to the new primary server
    case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
        sentinelFailoverSendSlaveOfNoOne(ri);
        break;

    // Wait for the upgrade to take effect. If the upgrade times out, select a new master server
    Please see sentinelRefreshInstanceInfo function / / and the circumstances
    case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
        sentinelFailoverWaitPromotion(ri);
        break;

    // Send the SLAVEOF command to the slave servers to synchronize the new master server
    case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
        sentinelFailoverReconfNextSlave(ri);
        break;
}
Copy the code

The whole process is as follows:

1. All sentinel sentinelFailoverWaitStart will according to the current leader and the current failover era elected leader. If the leader is himself, switch failover_state to SENTINEL_FAILOVER_STATE_SELECT_SLAVE and execute the next case. Otherwise jump out.

2. Logically elect the secondary server as the new primary node. If the secondary server is not selected, clear the failover status. After successful selection, continue to update the status and execute the next case. The logic of choice is there.

3. Upgrade the new node to the active node. If the node is disconnected and times out, the failover logic is terminated. This is essentially sending the slave of command asynchronously. Let the secondary node be upgraded to the master node. This command is explained in the Redis copy source code parsing.

4. The SENTINEL_FAILOVER_STATE_WAIT_PROMOTION state is just another logic waiting for an upgrade, and if the upgrade times out, failover is aborted. How do I check whether the secondary node is successfully upgraded to the primary node?

In fact, there is a logic in info:

if ((ri->master->flags & SRI_FAILOVER_IN_PROGRESS) &&
    (ri->master->failover_state ==
        SENTINEL_FAILOVER_STATE_WAIT_PROMOTION))
{
    // Update the configuration epoch of the primary server (offline) of the secondary server
    ri->master->config_epoch = ri->master->failover_epoch;
    // Set the failover status of the primary (offline) of the secondary server
    // This state causes the slave server to start synchronizing with the new master server
    ri->master->failover_state = SENTINEL_FAILOVER_STATE_RECONF_SLAVES;
    // Update the failover status change time of the primary server (offline) on the secondary server
    ri->master->failover_state_change_time = mstime();
    // Save the current Sentinel status to the configuration file
    sentinelFlushConfig();
    // Send events
    sentinelEvent(REDIS_WARNING,"+promoted-slave",ri,"% @");
    sentinelEvent(REDIS_WARNING,"+failover-state-reconf-slaves",
        ri->master,"% @");
    // Execute the script
    sentinelCallClientReconfScript(ri->master,SENTINEL_LEADER,
        "start",ri->master->addr,ri->addr);
}
Copy the code

This represents that the current slave node has been successfully upgraded, so update CONFIG_EPOCH. Config_epoch is actually used in Hello messages. If other sentienl finds that its CONFIG_epoch is smaller than the CONFIG_epoch in the Hello message, it resets the address of the master.

5.SENTINEL_FAILOVER_STATE_RECONF_SLAVES

The Info after the update is successful, will perform sentinelFailoverReconfNextSlave method. It basically sends the slave of command to all the slave servers. This is constrained by the parallel_syncs parameter. Control the number of parallel slave of to avoid network pressure on the master node.

After success, update failover_state to SENTINEL_FAILOVER_STATE_UPDATE_CONFIG

sentinelFailoverSwitchToPromotedSlave

This method is eventually called to switch the primary server and the whole migration process ends.

Failover failure

If the failover leader is alive and the failover times out, the Sentinel Casting Failover method is called to terminate the failover. If the leader goes down during the transition transition, the other nodes continue to perform the failover logic.

V. Development, operation and maintenance

Configuration parameters

Down-after-milliseconds TTL of the alive check node

Sentinel Parallel – Syncs primary node down, number of parallel copies allowed

Sentinel failfail-timeout Specifies the timeout period for failover. If the current time and the last failover status update time are greater than this value, the failover is terminated.

Sentinel notification-script Fault monitoring. When an event of sentinel warning level occurs, the script in the corresponding path is sent and parameters can be received by the script. Monitor.

The client connects to Sentinel

Start Obtain the primary node of Redis by traversing sentinel. Then subscribe to the switch event of each sentinel to ensure that it can be monitored during the primary/secondary switchover. Reinitialize the connection pool if you are listening for changes.

It is important to note that this event will not be sent until the failover is complete.

Six, summarized

The whole implementation is a bit complicated, but it can be sorted out on a per-point basis. Communicate via ping, info, and Pubsub Hello messages. Ensure the consistency of the entire system. Each failover is performed by only one Sentinel, and this election process relies on the RAFT algorithm Leader election logic. Failover logic relies on timeouts to avoid dead states. The whole logic depends on the state machine to switch, orderly, worth learning.