The experiment to prepare

  1. Experiment code:git://g.csail.mit.edu/6.824-golabs-2021/src/raft
  2. How to test:go test -run 2A -race
  3. Related papers: Raft Extended Section 5.2
  4. 6.824 Lab 2: Raft (mit.edu)

The experimental goal

Implement the Leader Election (RequestVote RPC) and Heartbeats (AppendEntries RPC) algorithm in Raft. Ensure that only one Leader is selected, and the Leader will always exist only if there are no errors. When the Leader goes offline or other errors occur, the sent data cannot be successfully received, a new Leader will be generated to replace the Leader.

Some hints

  1. Refer to Figure 2 of the paper to implement the corresponding structure and function.
  2. throughMake()Create a background Goroutine for a period of timeelection timeout) passes if no message is received from other nodesRequestVote RPCCall an election.
  3. Try to ensure different nodeselection timeoutWon’t let them launch elections at the same time, avoid all nodes voting only for themselves, can be set to randomelection timeoutTo implement.
  4. The test requires Hearbeats to be no more than 10 times per second.
  5. The test requires the New Leader to appear within 5 seconds after the Old Leader goes offline. Considering the situation of one election and multiple elections (as mentioned in tip 3), the Election timeout should be short enough.
  6. In the paper, theelection timeoutSet it between 150ms and 300ms if the Heartbeat frequency is much greater than 150ms at a time. Due to the limitation of hint 4, the experimentelection timeoutIt should be bigger.
  7. It is recommended to usetime.Sleep()Rather thantime.Timerortime.TickerTo implement periodic or deferred behavior.
  8. Don’t forget to implementGetState().
  9. userf.killed()Determines whether the test closed the node.
  10. Rpc-related structure fields should start with an uppercase letter because of the syntax of the Go language.

Introduction of Raft

Logs are understood as a sequence of requests from the client. In a cluster, there is only one Node that receives the requests from the client, called the “Leader Node”. To ensure data security, the logs of the “Leader Node” should be copied to several nodes, called “Follower Node”, for backup. The logs of the Follower Node must be consistent with those of the Leader Node. Raft is a consistency algorithm created to manage log replication.

Leadership election

Raft clusters usually have an odd number of nodes. Set this parameter to N. N/2 nodes are allowed to fail. In normal cases, a Raft cluster consists of one Leader and n-1 followers.

Followers change their identity to Candidate after election timeout, and to Leader after obtaining votes from (n-1)/2 other nodes.


The main structure

The first is the Raft structure. The specific properties are given in Figure 2 of the paper, but two additional properties are required.

  1. role: Indicates the identity of the current node.
  2. lastRecv: Indicates the time when a message was received from another node.

Annotated fields can be ignored in Part A, and in this section, the currentTerm field is ignored for ease of understanding.

type Raft struct {
	mu          sync.Mutex
	peers       []*labrpc.ClientEnd	// All nodes in the cluster
	// persister *Persister
	me          int			// Index of the current node in peers
	dead        int32		// Marks whether the current node is alive
    
	lastRecv    time.Time
	role        Role
    
	currentTerm int
	votedFor    int
	// log []LogEntry
    
	// commitIndex int
	// lastApplied int
    
	// nextIndex []int
	// matchIndex []int
}
Copy the code

Timeout election

If the current node is not the Leader and has not received a message from another node after election timeout, an election is initiated.

Set election Timeout between 150ms and 300ms.

func (rf *Raft) electionTimeout(a) time.Duration {
    return time.Duration(150 + rand.Int31n(150)) * time.Millisecond
}
Copy the code

After an election is initiated, the identity switches to a Candidate and gets votes from the remaining nodes through the RequestVote RPC. After obtaining the votes of (n-1)/2 other nodes, the identity switches to the Leader. After becoming the Leader, you must immediately send heartbeat messages to other nodes to announce the existence of the Leader.

The comments in the code are the logic in Figure 2.

func (rf *Raft) elect(a) {
    for! rf.killed() {if rf.role == Leader || time.Since(rf.lastRecv) < rf.electionTimeout() {
            return
        }
        /* On conversion to candidate, start election. */
        rf.role = Candidate
        /* Vote for self. */
        rf.voteFor = rf.me
        voteCount := 1
        /* Reset election timer. */
        rf.lastRecv = time.Now()
        /* Send RequestVote RPCs to all other servers. */
        for i, peer := range rf.peers {
            if i == rf.me {
                continue
            }
            reply := RequestVoteReply{}
            peer.Call("Raft.RequestVote", &RequestVoteArgs{
                CandidateId: rf.me,
            }, &reply)

            if reply.VoteGranted {
                voteCount++
            }
        }
        /* If votes received from majority of servers: become leader. */
        if voteCount > len(rf.peers)/2 {
            rf.role = Leader
            rf.votedFor = - 1
            rf.heartbeat()
        }
        time.Sleep(10 * time.Millisecond)
    }
}
Copy the code

Explain 1: How to understand rf.role == Leader?

Both followers and candidates can participate in the election. The reason why a Candidate can participate in the election is that it may take more than one round to elect a Leader. If, unfortunately, all nodes initiate the election at the same time and they all cast their votes for themselves, the Leader will not be elected in this round of election.

Since a second round of voting begins at that time, only followers can participate in the election.

Send a heartbeat

A log copy without a log is a heartbeat that the Leader uses to refresh the election timeout of the remaining nodes. Hint 4 limits the heartbeat rate to 10 beats per second, so it puts the heartbeat to sleep for 100ms after one.

func (rf *Raft) heartbeatInterval(a) {
    return 100 * time.Millisecond
}

func (rf *Raft) heartbeat(a) {
    for! rf.killed() {ifrf.role ! = Leader {return
        }
        for i, peer := range rf.peers {
            if i == rf.me {
                continue
            }
            reply := AppendEntriesReply{}
            peer.Call("Raft.AppendEntries", &AppendEntriesArgs{}, &reply)
        }
    }
    time.Sleep(rf.heartbeatInterval())
}
Copy the code

Remote Procedure Call (RPC) is a Remote Procedure Call that calls functions on other nodes. For example, peer.Call(” raft. AppendEntries”, &args, &reply) calls the AppendEntries function of the corresponding node with args and the return value is stored in reply.

Heartbeat is active; AppendEntries are passive. RequestVote is passive while ELECT is active.

RequestVote

Candidate requests votes from other nodes by remotely calling RequestVote. The definitions of RequestVoteArgs and RequestVoteReply are also given in Figure 2 of the paper. The LastLogIndex and LastLogTerm fields do not need to be concerned with in Part A. Again, ignore the Term concept for the sake of understanding.

type RequestVoteArgs struct {
	Term         int
	CandidateId  int
	// LastLogIndex int
	// LastLogTerm int
}

type RequestVoteReply struct {
	Term        int
	VoteGranted bool
}
Copy the code

In a round, each node has only one vote, with the Candidate voting for himself and the Follower voting for the first Candidate who asks for his vote.

The comments in the code are the logic in Figure 2.

In RequestVote, you need to refresh the election timeout.

func (rf *Raft) RequestVote(args *RequestVoteArgs, reply *RequestVoteReply) {
    reply.VoteGranted = false
    rf.lastRecv = time.Now()
    /* If votedFor is null or candidateId, grant vote. */
    if rf.votedFor == - 1 || rf.votedFor == args.CandidateId {
        rf.votedFor = args.CandidateId
        reply.VoteGranted = true}}Copy the code

Explain 2: How to understand rf.votedfor == args.CandidateId?

Logically this condition is unnecessary, and removing it still passes all tests. I guess this condition is to prevent the network packet from being lost and retransmitted by the sender, thus requiring the receiver to vote again.

AppendEntries

The Leader uses a remote call to AppendEntries to refresh the election timeout of another node to ensure that no other node will initiate an election during its lifetime. AppendEntriesArgs and AppendEntriesReply are also defined in Figure 2 of the paper.

Annotated fields don’t need attention in Part A, and again, ignore the Term field for ease of understanding.

type AppendEntriesArgs struct {
	Term         int
	// LeaderId int
	// PrevLogIndex int
	// PrevLogTerm int
	// Entries []LogEntry
	// LeaderCommit int
}

type AppendEntriesReply struct {
	Term    int
	Success bool
}
Copy the code

Hint 2, when receiving messages from other nodes, refreshes election Timeout. Rf.lastrecv therefore needs to be updated in both RequestVote and AppendEntries.

func (rf *Raft) AppendEntries(args *AppendEntriesArgs, reply *AppendEntriesReply) {
    reply.Success = true
    rf.lastRecv = time.Now()
}
Copy the code

The only Leader

The election success condition is voteCount > len(rfe.peers)/2 and each node has only one vote, which ensures that at most only one node meets the election success condition and ensures the uniqueness of the Leader.

Now consider that the heartbeat of the only Leader cannot be sent due to some network problems. Then the remaining N-1 nodes will choose a new Leader, and the remaining N-1 nodes can continue to provide normal services. Therefore, if the network problem of the Old Leader is recovered for some reason, two leaders will appear in the cluster at the same time, so that the log consistency of the whole cluster cannot be guaranteed.

Introduction of term

Term solves the problem of having multiple leaders.

Term is a monotonically increasing integer value, and the Term of all nodes should be the same. The Term increment only occurs during the transition from Follower to Candidate, that is, the Term will increment by 1 only during the election.

Considering the above problem again, when the Old Leader is restored, the Term of the remaining N-1 nodes including the New Leader is greater than the Term of the Old Leader because the remaining N-1 nodes have experienced at least one election. Raft algorithm provides that any node senses the node with a higher Term. Convert to Follower; Any node that senses the node with a lower Term will ignore the message of the other party and inform the other party of its own Term.

In this way, when the Old Leader receives the heartbeat of a higher Term from the New Leader, it will change its identity to Follower, ensuring the uniqueness of the Leader.

What is sensing other nodes?

AppendEntries or RequestVote Both RPC requests or replies contain Term, and the Old Leader senses the New Leader in two ways.

  1. AppendEntriesArgs.Term higher than the heartbeat received by the New Leader.
  2. Appendentriesreply.term is higher when heartbeat is sent to the New Leader or other nodes.

The experimental conclusion

The picture above is Figure 2 of the paper mentioned many times in this paper. I selected the Part to be realized in Part A with the green wire box.

It should be noted that data synchronization problems are omitted in this article for the sake of simple code. -race can expose data race problems in your code. Remember to lock critical resources.

Finally, to prove that I am not scribbling, I enclose my test results.