Raft protocol is another well-known protocol to solve the problem of consistency in the distributed domain. It includes two parts: Leader election and log replication.

Warm prompt: in this paper, according to official raft raft to learn animation, the animation display address: thesecretlivesofdata.com/raft/

This program recording

    • 1. Leader election
      • 1.1 In a round of voting, only one node initiates the voting
      • 1.2 In a round of voting, more than one node initiates the voting
      • 1.3 Thinking how to implement Raft selection
    • 2. Log replication

1. Leader election

1.1 In a round of voting, only one node initiates the voting



Raft protocol nodes have 3 states (roles) :

  • Have a good Follower.
  • Candidate.
  • The Leader, usually referred to as the master node.

The first three nodes have an initial state of Follower. Each node has a timeout period (timer), which is set to a random value between 150ms and 300ms. When the timer expires, the node status changes from Follower to Candidate, as shown in the following figure:



In general, one of the three nodes will be the first to expire the timer, the node state becomes Candidate, and the node in the Candidate state will initiate the election vote. Let’s first consider how a Candidate is selected when only one node becomes a Candidate.

When the node status is Candidate, a round of voting will be initiated. Since it is the first round of voting, set this round to 1 and Vote for yourself first, as shown in NodeA node, Team is 1 and Vote Count is 1.



When a node’s timer expires, it first casts a vote for itself, then casts a vote to other nodes in the group (canvassing is more appropriate), and sends a vote request.



When a node in the cluster receives a vote request, if the round has not been voted, the vote is approved, otherwise the vote is rejected, and the result is returned and the timer is reset.



When node A receives more than half of the votes, it is promoted to the Leader of the cluster and then sends heartbeat periodically to other nodes in the cluster to determine its leadership, as shown in the following figure.



Node A, where the Leader in the cluster is sending heartbeat packets to other nodes.



After receiving the heartbeat packet from the Leader, the node returns the response result and resets its timer. If the node in the Flower state does not receive the heartbeat packet from the Leader within the timeout period, it will become a Candidate from the Flower node and initiate the next round of voting.

For example, the NodeA node goes down and stops sending heartbeats to its slaves. Let’s take a look at how the cluster is reselected.



If the primary node is down, it stops sending heartbeat packets to nodes in the cluster. As the timer expires, node B becomes Candidate before node C, then node B votes to other nodes in the cluster, as shown in the figure below.



Node B first sets the voting cycle to 2, then votes for itself first, and then initiates voting requests to other nodes.



Node C receives a request, since its voting round is larger than its own, and the round does not vote, votes yes and returns the result, and then resets the timer. Node B naturally becomes the new Leader and sends heartbeat packets periodically.

Although the timer of each node is random, one node may become a Candidate at the same time or before receiving the voting request initiated by another node. In other words, during a round of voting, more than one node is a Candidate. So how do you choose him?

The following takes a cluster of four nodes as an example to illustrate how to choose the master in the above case.

1.2 In a round of voting, more than one node initiates the voting

Firstly, two nodes enter the Candidate state at the same time and start a new round of voting. The current voting number is 4. First, cast a vote for yourself and then vote for other nodes in the cluster, as shown in the figure below:



Then each node receives the voting request, as shown below, and votes:



First node after receipt of the D C, D, C voting request, would return to disagree, because in this round of voting, have their own in the vote, according to the above, the node A agree node, node B C D node, that at this point C and D are only received two votes, of course, if A and B agree that C or D into the master node, the choice can be ended, As shown in the figure above, both C and D only get 2 votes, less than half of the votes, so they cannot become the main node. Then what will happen next? See the picture below:



At this point, the timers of A,B,C and D are counting down respectively. When the node becomes Candidate or its state itself is Candidate and the timer is triggered, A new round of voting is initiated. In the figure, node B and node D simultaneously initiate A new round of voting.



The voting results are as follows: Node A and node C agree that node B becomes the leader, but since BD both launched the fifth round of voting, the final voting round is updated to 6, as shown in the figure:



This is where Raft protocol is chosen, so let’s think about at least some of the issues you should consider if you implement the Raf T protocol yourself, and provide some ideas for the Dleger(RocketMQ multi-copy) module for the next source article.

1.3 Thinking how to implement Raft selection

  1. Node states include three node states: Follower, Candidate, trigger point of voting, and Leader.
  2. When entering the voting state, a timer needs to be maintained. Each timer is randomly timed from 150ms to 300ms, i.e. each time the timer expires differently for each node. When the timer is in the Follower state, a round of voting will be triggered after the timer expires. The node needs to reset the timer after receiving the vote request and responding to the heartbeat request from the Leader.
  3. For nodes in Team Candidate state, Term is increased by one for each round of voting initiated; Storage of Term.
  4. In the voting mechanism, A node can only vote for one node in each round. For example, node A has voted for node B in 3 rounds, and if it receives other nodes in 3 rounds, it votes against the node. If it receives A node in 4 rounds, it can vote for the node again.
  5. To become a Leader, the majority of the nodes in the cluster must be obtained, that is, more than half. For example, if there are three nodes in the cluster, two votes must be obtained. If one of the servers breaks down, can the remaining two nodes still be selected as the Leader? The answer is yes, because you can get 2 votes, more than half of the 3 in the original cluster, so you usually try to count the machines in the cluster because the availability of 4 is the same as that of 3.

Tips: The above conclusions are just some of my thoughts, we can take the above thoughts into the study of Dleger, the next article will learn how to implement Raft protocol Leader selection from the perspective of source code analysis, let’s look forward to it.

2. Log replication

After the primary selection in the cluster is completed, the client sends a request to the primary node, and the primary node is responsible for data replication to maintain data consistency in the cluster. The initial status is shown in the following figure:



The client sends a request to the master node, such as SET 5, to update the data to 5, as shown below:



After receiving the client request, the master node appends the data to the Leader’s log (but does not commit it), and then forwards the log to the slave node in the cluster in the next heartbeat packet, as shown below:



When the Leader’s log is received from the node, it is appended to the slave node’s log file and an acknowledgement ACK is returned. After receiving the confirmation information from the slave node, the Leader sends the confirmation information to the client.



The above log replication is relatively simple because only normal cases are considered. If an exception occurs in the middle, how can data consistency be ensured?

  1. What if the Leader node broadcasts logs to the slave nodes and one of the slave nodes sends a failure and goes down?
  2. At what point are logs submitted? After receiving the data change request from the client, the Leader node first appends the data change request to the log file of the primary node, and then broadcasts the log information to the slave node. After receiving the log information from the slave node, the Leader node sends an ACK or when to submit the log information?
  3. How logs are guaranteed to be unique.
  4. How to handle network partitions.

I’m sure there are many more questions out there, and I’m not going to try to answer them in this article, but rather take them into the multi-copy RocketMQ tutorial and review the RAFT protocol after analyzing the RocketMQ DLedger implementation from source code.

The next article will focus on how the RocketMQ Dledger multi-copy module implements raft protocol selection.


See article such as surface, I am Weige, keen on systematic analysis of JAVA mainstream middleware, pay attention to the public number “middleware interest circle”, replycolumnCan get into the system column navigation, replydataCan obtain the author’sLearn mind mapping.