Translated from Eli Bendersky’s blog series, with permission from the original author.

This article is the first in a series of articles about the Raft distributed consistency protocol and its implementation with the Go language. The full list of articles is as follows:

  • Preface: Introduction
  • Part I: Choosing a Master (This article)
  • Part two: Instruction and log replication
  • Part three: Persistence and optimization

In this section, I introduce the structure of our Raft implementation code and focus on the main selection part of the algorithm. The code in this article includes a full-featured testing tool and some examples that you can use to test your system. However, it does not respond to client requests and does not maintain logs, as will be added in Part 2.

The code structure

A brief introduction to the code structure of the Raft implementation, all parts of this series are generic.

Raft is typically implemented as an object that can be embedded with some services. Since we’re not actually developing a service, just exploring the Raft protocol itself, we created a simple Server type that wraps a ConsensusModule type in order to isolate the most interesting parts of the code as much as possible:

The consistency module (CM) implements the core of the Raft algorithm in the raft.go file. The module abstracts from the details of the network and its connections to other copies in the cluster. The only fields associated with the network in ConsensusModule are:

// id is the id of the server in the consistency module
id int

// peerIds are the ids of all the peers in the cluster
peerIds []int

// Server is the server that contains the CM. This field is used to make RPC calls to other peers
server *Server
Copy the code

During implementation, each Raft copy refers to the other copies in the cluster as “peers.” Each peer in the cluster has a unique numeric ID and a list of peer ids to record. The Server field is a pointer to the module’s Server* (implemented in server.go), which allows ConsensusModule to send messages to its peers. We’ll see how this is done later.

The idea is to cut out all the network details and focus on the Raft algorithm itself. In short, to compare Raft paper to this implementation. All you need is the ConsensusModule class and its methods. The Server code is a very simple Go language network framework with some subtle intricacies to handle rigorous testing. I won’t spend time discussing it in this series. But if anything is unclear, feel free to ask.

Raft Server Status

In general, Raft CM is a state machine with 3 states [1] :

This can be a bit confusing because of the length of the prologue explaining how Raft helps implement the state machine, but it’s important to note that the term * state means something different here. Raft is an algorithm that implements any replicated state machine, but Raft also contains a small state machine inside. In later chapters, what a state * somewhere means can be worked out from top to bottom – if not, I’m sure I pointed it out.

In a typical steady-state scenario, one server in the cluster is the leader and the other replicas are followers. As much as we’d like to see the system run like this forever, the goal of the Raft protocol is to be fault-tolerant. Therefore, we will spend most of our time discussing atypical failure scenarios, such as some servers crashing, others disconnecting, and so on.

As mentioned earlier, Raft uses a strong leadership model. The leader responds to a client request by adding a new entry to the log and copying it to other followers. Each follower is ready to take over leadership in case the leader breaks down or stops communicating. This is the transition from follower to Candidate shown above (” Wait for time out, start voting “).

Term of office

Just like normal elections, there are terms in Raft. Tenure is the length of time a server is the leader. A new election triggers a new term, and Raft ensures that there is only one leader for a given term.

But that’s where the comparison ends, because there’s a big difference between choosing a Lord in Raft and actually voting. In Raft, elections are much more collaborative and the candidate’s goal is not to win at all costs — all candidates have a common goal of having the right server win elections in any given term. More on what “appropriate” means later.

Election timer

A key component of the Raft algorithm is the election timer. This is the timer that each follower runs continuously, rebooting each time they receive a message from the current leader. Leaders send periodic heartbeats, so when followers don’t receive them, they assume the current leader is malfunctioning or disconnected and initiate a new election (switching to candidate status).

Q: Won’t all the followers become candidates at the same time?

A: Election timers are random, which is one of the keys to keeping the Raft protocol simple. Raft uses this randomization to reduce the likelihood of multiple followers voting at the same time. But even if they become candidates at the same time, only one server will be chosen as the leader in any given term. In rare cases, a split vote results in no candidate winning, and a new election (with a new term) takes place. Although it is theoretically possible to run again forever, the probability of such a scenario decreases with each successive election cycle.

Q: What if a follower disconnects (partitions) from the cluster? Won’t it start an election because it hasn’t heard from its leader?

A: This is the insidious nature of the network partition problem, because followers can’t tell who has been partitioned. Indeed, this follower will start a new election. But if the follower is disconnected, the election will be null and void — it will not be able to connect with other followers and will not receive any votes. It may spin around in the candidate state (periodically initiating a new election) until it rejoins the cluster. We’ll discuss this situation in more detail later.

RPC between associates

In Raft protocol, there are two types of RPC requests sent between peers. For detailed parameters and rules, please refer to Figure 2 in the paper or the appendix of this paper. Here are two types of requests:

  • RequestVote(RV): This parameter is used only in the candidate state. In an election round, candidates request votes from their peers through this interface. The return value contains a vote approval flag.
  • AppendEntries(AE): Used only in the leader state. This RPC is used by the leader to copy log entries to followers and also to send heartbeats. This RPC request is periodically sent to followers even if there is no log entry to copy.

A discerning eye might see that followers do not send any RPC requests. That’s right, followers don’t make RPC requests to peers, but they do run an election timer in the background. If no message from the current leader is received before the timer expires, the follower becomes the candidate and starts sending RV requests.

Implementing the Election timer

It’s time to start looking at the code. Unless otherwise noted, all of the code examples shown below are from this file. I will not include all the fields of the ConsensusModule structure — you can view them in the code file.

Our CM module implements the election timer by executing the following functions in GorouTime:

func (cm *ConsensusModule) runElectionTimer(a) {
    timeoutDuration := cm.electionTimeout()
    cm.mu.Lock()
    termStarted := cm.currentTerm
	cm.mu.Unlock()
	cm.dlog("election timer started (%v), term=%d", timeoutDuration, termStarted)

    /* The loop ends when: 1 - The election timer is no longer needed. 2 - The election timer times out and CM becomes candidate. For followers, the timer usually runs in the background for the entire life of the CM. * /
    ticker := time.NewTicker(10 * time.Millisecond)
	defer ticker.Stop()
	for {
		<-ticker.C

		cm.mu.Lock()
        // CM no longer needs timer
		ifcm.state ! = Candidate && cm.state ! = Follower { cm.dlog("in election timer state=%s, bailing out", cm.state)
			cm.mu.Unlock()
			return
		}
		
        // Term changes
		iftermStarted ! = cm.currentTerm { cm.dlog("in election timer term changed from %d to %d, bailing out", termStarted, cm.currentTerm)
			cm.mu.Unlock()
			return
		}

		// If no message from the leader or votes for other candidates are received before the time limit expires, a new election is held
		if elapsed := time.Since(cm.electionResetEvent); elapsed >= timeoutDuration {
			cm.startElection()
			cm.mu.Unlock()
			return
		}
		cm.mu.Unlock()
	}
}
Copy the code

First, select a (pseudo) random electionTimeout by calling cm.electiontimeout (). Here we set the range from 150ms to 300ms as suggested in the paper. Like most methods in ConsensusModule, runElectionTimer locks the structure object before accessing the property. This step is essential because we want to support concurrency as much as possible, which is one of Go’s strengths. This also means that the code needs to be executed sequentially rather than spread across multiple event handlers. However, RPC requests are also occurring, so we must secure shared data structures. We’ll cover RPC processors later.

A ticker with a period of 10ms runs in the main loop. There are more efficient ways to implement waiting events, but the code written this way is the simplest. The loop is executed every 10ms, and the timer could theoretically sleep through the entire wait, but this would slow down the service response and make debugging/tracing operations in the log more difficult. We check to see if the status is as expected [2] and if the term has changed, and if there is any problem, we stop the election timer.

If it has been too long since the last “election reset event”, the server will start a new election and become a candidate. What is an election reset event? It could be anything that stops an election — for example, receiving a valid heartbeat message to vote for another candidate. We’ll see that code shortly.

Be a candidate

As mentioned earlier, if followers have not heard from the leader or other candidates for a period of time, it will start a new election. Before we look at the code, let’s think about what we need to do to conduct an election:

  1. Change the status to candidate and increase the term, as this is what the algorithm requires for each election.
  2. Send RV requests to other companions asking them to vote for them in this election.
  3. Wait for the return value of the RPC request and count if we have enough votes to be the leader.

In Go, this logic can be done in a function:

func (cm *ConsensusModule) startElection(a) {
  cm.state = Candidate
  cm.currentTerm += 1
  savedCurrentTerm := cm.currentTerm
  cm.electionResetEvent = time.Now()
  cm.votedFor = cm.id
  cm.dlog("becomes Candidate (currentTerm=%d); log=%v", savedCurrentTerm, cm.log)

  var votesReceived int32 = 1

  // Send RV requests to all other servers
  for _, peerId := range cm.peerIds {
    go func(peerId int) {
      args := RequestVoteArgs{
        Term:        savedCurrentTerm,
        CandidateId: cm.id,
      }
      var reply RequestVoteReply

      cm.dlog("sending RequestVote to %d: %+v", peerId, args)
      if err := cm.server.Call(peerId, "ConsensusModule.RequestVote", args, &reply); err == nil {
        cm.mu.Lock()
        defer cm.mu.Unlock()
        cm.dlog("received RequestVoteReply %+v", reply)

        // State is not a candidate, withdraw from the election (may degenerate into followers, may have won the election as a leader)
        ifcm.state ! = Candidate { cm.dlog("while waiting for reply, state = %v", cm.state)
          return
        }

        // There is higher tenure (new leader), turn to followers
        if reply.Term > savedCurrentTerm {
          cm.dlog("term out of date in RequestVoteReply")
          cm.becomeFollower(reply.Term)
          return
        } else if reply.Term == savedCurrentTerm {
          if reply.VoteGranted {
            votes := int(atomic.AddInt32(&votesReceived, 1))
            if votes*2 > len(cm.peerIds)+1 {
              // Get more than half of the votes, win the election and become the latest leader
              cm.dlog("wins election with %d votes", votes)
              cm.startLeader()
              return
            }
          }
        }
      }
    }(peerId)
  }

  // Start another election timer in case the election fails
  go cm.runElectionTimer()
}
Copy the code

The candidate first votes for himself — initializing votesReceived to 1 and assigning cm.votedfor = cm.id.

It then sends RPC requests to all peers in parallel. Each RPC is done in its own Goroutine because our RPC calls are synchronous — the program blocks until a response is received, which can take a while.

Here’s a good example of how RPC is implemented:

cm.server.Call(peerId, "ConsensusModule.RequestVote", args, &reply);
Copy the code

We use ConsensusModule. Server stored in the server a pointer to a remote invocation, and specify the ConsensusModule. RequestVotes method name as a request, will call the first parameter specifies the partner in the server RequestVote method.

If the RPC call is successful, since a period of time has passed, we must check the server state to decide what to do next. If our status is not candidate, withdraw. When does that happen? For example, we might be elected leader because another RPC request returned enough votes, or an RPC request received a higher term from another server, so we degenerated into followers. It is important to keep in mind that RPC requests can take a long time to arrive in an unstable network — other code may have moved on by the time we receive the reply, in which case it is important to gracefully abort.

If we are still a candidate when we receive the reply, check the tenure in the reply message and compare it to the tenure when we sent the request. If the term in the returned message is higher, we revert to the follower state. For example, this would happen if another server won while we were collecting votes.

If the term returned is the same as when we sent it, check for an approval vote. We use the atomic variable votes to securely collect votes from multiple Goroutines, and if the server receives a majority of votes (including its own), it becomes the leader.

Note that the startElection method here is non-blocking. Method updates some state, launches a batch of Goroutines and returns. Therefore, you should also start the new election timer in Goroutine — which is what the last line of code does. This would guarantee that if the current round of elections failed to produce a result, a new round would be held at a fixed time. This also explains the state check in runElectionTimer: if the current election does make the server a leader, the concurrent runElectionTimer will return when it observes that the state of the server is different than expected.

Be a leader

As we have seen, the startLeader method is called in startElection when the result of the vote shows that the current server has won, with the following code:

func (cm *ConsensusModule) startLeader(a) {
  cm.state = Leader
  cm.dlog("becomes Leader; term=%d, log=%v", cm.currentTerm, cm.log)

  go func(a) {
    ticker := time.NewTicker(50 * time.Millisecond)
    defer ticker.Stop()

    // Heartbeat is sent periodically as long as the current server is the leader
    for {
      cm.leaderSendHeartbeats()
      <-ticker.C

      cm.mu.Lock()
      ifcm.state ! = Leader { cm.mu.Unlock()return
      }
      cm.mu.Unlock()
    }
  }()
}
Copy the code

This is actually a fairly simple approach: everything is the heartbeat timer — this Goroutine will call leaderSendHeartbeats every 50ms as long as the current CM is the leader. Here is the code for leaderSendHeartbeats:

func (cm *ConsensusModule) leaderSendHeartbeats(a) {
  cm.mu.Lock()
  savedCurrentTerm := cm.currentTerm
  cm.mu.Unlock()

  // Send AE requests to all followers
  for _, peerId := range cm.peerIds {
    args := AppendEntriesArgs{
      Term:     savedCurrentTerm,
      LeaderId: cm.id,
    }
    go func(peerId int) {
      cm.dlog("sending AppendEntries to %v: ni=%d, args=%+v", peerId, 0, args)
      var reply AppendEntriesReply
      if err := cm.server.Call(peerId, "ConsensusModule.AppendEntries", args, &reply); err == nil {
        cm.mu.Lock()
        defer cm.mu.Unlock()
        // If the term in the response message is greater than the current term, it indicates that the cluster has a new leader, converted to followers
        if reply.Term > savedCurrentTerm {
          cm.dlog("term out of date in heartbeat reply")
          cm.becomeFollower(reply.Term)
          return
        }
      }
    }(peerId)
  }
}
Copy the code

The logic here is somewhat similar to startElection, which launches a Goroutine for each peer to send an RPC request. The RPC request here is AppendEntries(AE) without log content that acts as the heartbeat in Raft.

As with the RV response, if the RPC returns a term higher than our own term value, the current server becomes a follower. Just check out the becomeFollower method here:

func (cm *ConsensusModule) becomeFollower(term int) {
  cm.dlog("becomes Follower with term=%d; log=%v", term, cm.log)
  cm.state = Follower
  cm.currentTerm = term
  cm.votedFor = - 1
  cm.electionResetEvent = time.Now()

  // Start the election timer
  go cm.runElectionTimer()
}
Copy the code

In this method, the CM’s state is first changed to follower, and its tenure and other important state attributes are reset. A new election timer is also started, as this is a task that each follower runs in the background.

Answering an RPC request

So far, we’ve seen the active part of the implementation code — the part that starts the RPC, the timer, and the state transition. But until we see the server methods, which are called remotely by other peers, the demo code is incomplete. Let’s start with RequestVote:

func (cm *ConsensusModule) RequestVote(args RequestVoteArgs, reply *RequestVoteReply) error {
  cm.mu.Lock()
  defer cm.mu.Unlock()
  if cm.state == Dead {
    return nil
  }
  cm.dlog("RequestVote: %+v [currentTerm=%d, votedFor=%d]", args, cm.currentTerm, cm.votedFor)

  // The term in the request is greater than the local term, switching to the follower state
  if args.Term > cm.currentTerm {
    cm.dlog("... term out of date in RequestVote")
    cm.becomeFollower(args.Term)
  }

  // The same term, and did not vote or voted for the current request partner, return to vote; Otherwise, return to vote against.
  if cm.currentTerm == args.Term &&
    (cm.votedFor == - 1 || cm.votedFor == args.CandidateId) {
    reply.VoteGranted = true
    cm.votedFor = args.CandidateId
    cm.electionResetEvent = time.Now()
  } else {
    reply.VoteGranted = false
  }
  reply.Term = cm.currentTerm
  cm.dlog("... RequestVote reply: %+v", reply)
  return nil
}
Copy the code

Note that the “dead” state is checked here, which I’ll discuss later.

The first is a familiar piece of logic, checking if tenure is out of date and converting to followers. If it is already a follower, the state does not change but other state attributes are reset.

Otherwise, if the caller’s term is the same as ours, and we haven’t voted for another candidate, we endorse the ballot. We will never ask for a vote from an RPC initiated by the previous term.

Here is the code for AppendEntries:

func (cm *ConsensusModule) AppendEntries(args AppendEntriesArgs, reply *AppendEntriesReply) error {
  cm.mu.Lock()
  defer cm.mu.Unlock()
  if cm.state == Dead {
    return nil
  }
  cm.dlog("AppendEntries: %+v", args)

  // The term in the request is greater than the local term, switching to the follower state
  if args.Term > cm.currentTerm {
    cm.dlog("... term out of date in AppendEntries")
    cm.becomeFollower(args.Term)
  }

  reply.Success = false
  if args.Term == cm.currentTerm {
    // If the current state is not follower, it becomes follower
    ifcm.state ! = Follower { cm.becomeFollower(args.Term) } cm.electionResetEvent = time.Now() reply.Success =true
  }

  reply.Term = cm.currentTerm
  cm.dlog("AppendEntries reply: %+v", *reply)
  return nil
}
Copy the code

The logic here is also consistent with the main selection part in Figure 2 of the paper. One complication to understand is that:

ifcm.state ! = Follower { cm.becomeFollower(args.Term) }Copy the code

Q: If the server is the leader — why become the follower of another leader?

A: Raft protocol ensures that there is only one leader for any given term. If you study the logic of RequestVote yourself, and the code in startElection that sends the RV request, you will see that there are no two leaders with the same term in the cluster. This condition is important for candidates who find that other candidates have won this election.

State and goroutine

It is worth reviewing all the possible states in CM and the different goroutines that run on them:

Followers: When the CM is initialized as a follower, or every time the becomeFollower method is executed, a new Goroutine is launched to run the runElectionTimer, which is a subsidiary operation of the follower. Note that it is possible to run multiple election timers simultaneously in a short period of time. Suppose a follower receives an RV request from a leader with a higher tenure, which triggers a becomeFollower call and starts a new timer, Goroutine. But the old Goroutine will retire as soon as it notices the change in tenure.

Candidate: A candidate also has a goroutine election timer that runs in parallel, but in addition, it has goroutines that send RPC requests. It has the same protection as followers and can stop the “old” election Goroutine when a new election starts. Keep in mind that RPC Goroutine can take a long time to complete, so if they find their tenure obsolete when RPC calls return, they must quietly exit.

Leader: The leader does not have an elected timing Goroutine, but it certainly has a heartbeat Goroutine that executes every 50ms.

There is an additional state in the code called Dead. This is purely for an orderly shutdown of CM. Calling the “Stop” method will set the state to Dead, and all goroutines will exit as soon as this state is observed.

The execution of these Goroutines can be worrisome — what if some of them get stuck in the background? Or worse, what if these Goroutines leak and grow in numbers without limit? This is the purpose of leak checking, and leak checking has been enabled in some test cases. These tests perform an unusual set of Raft elections and ensure that no stray Goroutines are running at the end of the test (give them some time to exit after calling the stop method).

Server runs out of control and increases tenure

To conclude this section, let’s take a look at a potentially complex scenario and how Raft deals with it. I find this example very interesting and enlightening. I’m trying to tell a story here, but you’d better have a piece of paper to keep track of the state of each server. If you can’t understand this example – please email me and I’ll be happy to make it clearer.

Imagine A cluster with three servers A, B, and C. Suppose A is the leader, the start term is 1, and the cluster is running flawlessly. A sends AE heartbeat requests to B and C every 50ms, and receives timely responses within milliseconds. Each AE request resets the electionResetEvent property in B and C, so they are happy to remain followers.

At some point, due to A temporary failure of the network router, A network partition was created between server B and A and C. A still sends AE requests every 50ms, but these AE requests either fail immediately or fail due to timeout of the underlying RPC engine. A can’t do anything about it, but it doesn’t matter. We haven’t touched on log replication yet, but because two of the three servers are healthy, the cluster can still submit client instructions.

What about B? Assume that its election timeout is set to 200ms when the connection is disconnected. After approximately 200ms of disconnection, B’s runElectionTimer will realize that no message has been received from the leader during the election waiting time, and B cannot distinguish who made a mistake, so it becomes the candidate and starts a round of election.

So B’s term will become 2 (while A and C will still have 1). B will send an RV request to A and C asking them to vote for him; Of course, these requests get lost in the network. Don’t panic! The startElection method in B also starts another Goroutine to perform the runElectionTimer task, assuming that this Goroutine will wait for 250ms (remember that our timeout is randomly selected between 150ms and 300ms), To see if there were any tangible results from the last round of elections. Since B is still completely isolated, nothing will happen, so the runElectionTimer will initiate another election and increase the term to 3.

As such, B’s server resets itself and comes online again a few seconds later, while B’s term has changed to 8 due to periodic elections.

At this point, the network partition problem has been fixed, and B reconnects to A and C.

Soon after, the AE request sent by A arrives. Recall that A sent A heartbeat message every 50ms, even though B never replied.

AppendEntries of B are called and the reply message carries task 8.

A receives this reply in the leaderSendHeartbeats method and checks the term in the reply message to initiate A term higher than his own. A changes his tenure to 8 and becomes A follower. The cluster has temporarily lost its leader.

Then depending on the timing, there are a number of things that can happen. B is a candidate, but it may have sent the RV request before the network was restored; C is A follower, but because he does not receive A’s AE request within the election timeout, he will also become A candidate. A became A follower, or maybe A candidate because the election ran out.

So any one of these servers could win the next round of elections, just because we’re not copying any logs here, mind you. As we will see in the next section, in reality, A and C may add some write client instructions while B is offline, so their logs are up to date. So, B won’t become the new leader — there will be A new round of elections, and EITHER A or C will win. We’ll revisit this scenario in the next section.

If no instructions are added after B disconnects, it is perfectly ok to change the leader after reconnecting.

It may seem inefficient — and it is. There is no need to change leaders here, because A is very healthy throughout the scene. However, keeping the logic of the algorithm simple at the expense of efficiency in particular cases is one of the choices Raft made. The efficiency of the algorithm in the normal case (without any exceptions) is more important because the cluster is in this state 99.9% of the time.

The next step

To ensure that your understanding of the implementation is more than theoretical, I strongly recommend that you run the code.

The README file in the code base provides detailed instructions for code interaction, running test cases, and observing results. There are a number of scenariospecific tests that come with the code (including the scenarios mentioned in the previous section), and it makes sense to run a test case and look at the Raft log. Notice the cm.dlog(…) call in the code Yet? A tool is provided in the repository to visualize these logs in HTML files — you can see the instructions in the README file. Run the code, view the logs, and feel free to add your own Dlog to your code to better understand when different parts of your code are running.

Part 2 of this system describes a more complete Raft implementation that handles client instructions and replicates these logs in the cluster. Stay tuned!

The attached:

Figure 2 from Raft’s paper is shown below and is briefly translated and explained here. There are sections on log replication and commit that you can revisit after the next article.

State of the State

There are three types of status fields in the server.

State that needs to be persisted on all servers (that needs to be updated to stable storage before responding to RPC requests)

field instructions
currentTerm The latest term received by the server (initialized to 0 at startup, monotonically increasing)
votedFor ID of the candidate who received the endorsement vote in the current term (null if not)
log[] Log entries; Each entry contains instructions for entering the state machine and the term of the leader when the entry is received (the first index is 1)

Frequently changed status fields in all servers:

field instructions
commitIndex Confirm the maximum index value of the submitted log entry (initialized to 0, monotonically increasing)
lastApplied The maximum index value of the log entry applied to the state machine (initialized to 0 and monotonically increasing)

Frequently modified status fields in the Leader server (reinitialized after the election) :

field instructions
nextIndex[] For each server, store the index of the next log entry to be sent to that server (initialized to the leader’s latest log index +1)
matchIndex[] For each server, store the maximum index value (initialized to 0, monotonically increasing) of log entries that are confirmed to be copied to that server

AE request

AE requests are AppendRntries requests initiated by the leader to replicate client instructions to followers and to maintain heartbeats.

Request parameters
parameter instructions
term Leadership tenure
leaderId With the leader ID, followers can redirect the client
prevLogIndex The index of the entry immediately before the new log entry
prevLogTerm prevLogIndexThe term of the corresponding entry
entries[] Log entries that need to be reported as errors (empty is a heartbeat request; Multiple logs may be sent for efficiency)
leaderCommit The leader’scommitIndex
The return value
parameter instructions
term CurrentTerm, currentTerm, replies to the leader. Leaders use it for self-renewal
success If followers saveprevLogIndexandprevLogTermMatches the log entrytrue
The recipient implements:
  1. ifterm < currentTermTo return tofalse;
  2. If the logprevLogIndexThe term of the corresponding entry andprevLogTermDoes not match, returnfalse;
  3. If an existing local log entry conflicts with the new log (same index, but different term), delete the existing local entry and all subsequent entries.
  4. Append all new entries not saved in the log;
  5. ifleaderCommit > commitIndex, it willcommitIndexSet toleaderCommitAnd a smaller value in the index of the latest entry.

The RV request

Candidate execution, used to collect votes when an election is called.

Request parameters
field instructions
term Term of office of a candidate
candidateId Candidate ID to request a ballot
lastLogIndex The latest log entry for the candidate corresponds to the index
lastLogTerm The latest log entry for a candidate corresponds to the term
The return value
field instructions
term CurrentTerm, the currentTerm, replies to the candidate. Candidates for self-renewal
voteGranted True indicates that the candidate has received a yes vote
The recipient implements:
  1. ifterm < currentTermreturnfalse;
  2. ifvotedForIs empty or equal tocandidateId, and the candidate’s log is at least as new as the recipient’s log, vote yes.

Server response rule

According to the current state (role) of the server, they are introduced respectively:

All servers:
  • ifcommitIndex > lastAppliedIncrease:lastAppliedThat will belog[lastApplied]Applied to the state machine;
  • If the term T carried in the RPC request or response satisfiesT > currentTermSet:currentTerm = TConvert to followers.
Followers:
  • Respond to RPC requests from candidates and leaders;
  • If no AE request from current leader or vote for candidate is received within the timeout waiting time: Convert to candidate.
Candidate:
  • The election starts when you have just converted to a candidate:
    • Increase current term,currentTerm
    • Vote for yourself
    • Resetting the election timer
    • Send RV requests to all other servers
  • If you receive a majority of the server’s votes: Become the leader
  • If receiving an AE request from the new leader: Convert to follower
  • If the wait for an election expires: a new election is called
Leader:
  • When selected: send an empty AE request initialized (heartbeat) to each server; AE requests are also sent repeatedly in idle time to prevent followers from waiting for timeout;
  • If an instruction is received from a client: append an entry to the local log and respond to the client after the new instruction is applied to the state machine;
  • If the latest log index index and the follower’s next log index nextIndex meetThe index or greater nextIndex: Sends AE requests to followers, carrying fromnextIndexStart all log entries:
    • If successful: update the corresponding of the followernextIndexandmatchIndex;
    • If AE fails due to log inconsistencies: reducednextIndexAnd try again;
  • If N exists, yesN > commitIndex, most ofMatchIndex [I] N or moreAnd,log[N].term == currentTermSet:commitIndex = N

  1. This diagram is the same as Figure 4 in Raft’s paper. It’s also worth mentioning in this series of articles. I’m assuming you’ve already read this paper. ↩ ︎

  2. Checking status for followers and candidates may seem a little strange. Can a server suddenly become a leader without a runElectionTimer initiated election? Read on to learn how candidates can restart election counters. ↩ ︎