Raft consensus algorithm for distributed systems

The author works for JINGdong, and has a deep understanding of stability assurance, agile development, advanced JAVA, and micro-service architecture

In addition to improving the performance of the whole system, distributed system also has an important feature of improving the reliability of the system. Providing reliability can be understood as the failure of one or more machines in a system without rendering the system unusable or losing data. The key to ensure system reliability is multiple copies. Once there are multiple copies, the problem of consistency between multiple copies will be faced. Consistency algorithm is used to solve the problem of data consistency between multiple copies in distributed environment. The most famous consistency algorithm in the industry is the well-known Paxos, but Paxos is notoriously difficult to understand. Raft was created to explore a more understandable consistency algorithm that breaks consistency into three key elements: leader election, log replication, and security

1. Leader election Raft algorithm divides time into arbitrary terms of different lengths, which are represented by consecutive numbers and the beginning of each term is an election. If a candidate wins the election, it serves as leader for the remainder of that term. If no leader is elected, another term begins and the next election begins immediately

So when does the election start? In fact, the leader periodically sends heartbeat messages to all followers to ensure their status as the leader. If a follower does not receive a heartbeat message within a period of time, it assumes that no leader is available and switches status to become a candidate. And an election is held to elect a new leader

The Raft election process has three rules (1), rules: If the server requesting the vote has a longer term than your own, vote for that server

(2) Rules: Within a term, one server can vote for one candidate at most, on a first-come, first-served basis

(3) Rule: Random election timed out

In order to avoid votes being evenly divided, no server becomes the leader, and each server is randomly selected within a fixed range, so that the election timeout time of each server is different. This mechanism makes only one server take the lead in timeout in most cases. It wins elections and sends heartbeat messages to other servers before they time out

When the candidate receives the votes of most nodes, he becomes the leader. If a leader is discovered or a request for a higher term is received, the leader is converted to followers. The election security principle is observed during role transition, and a maximum of one leader is allowed to be elected within a term

2, diary copy

The journal matching principle describes that if two entries in different journals have the same index and tenure number, they store the same command. If two entries in different diaries have the same index and tenure number, then all entries between them are exactly the same. A committable diary describes how once an entry created by the leader has been copied to most servers, the diary is called committable. Previous entries in the Leader diary are all committable, including those created by the previous leader

In the diary copying process, the leader needs to find where the follower matches his diary, delete any entries after that position, and then send his own entries after that position to the follower

The complete process is that the client sends a request to the leader, who adds the diary entry to its log and sends a parallel request to the follower server to copy the diary entry. After confirming that the diary entry is copied safely, the leader applies the entry to the state machine and then returns the result to the client. Make a request to the follower that the journal entry can be submitted. When one server has applied to the state machine a journal entry that gives it an index location, all other servers do not apply different entries at that index location

In diary replication, various exceptions may occur (1). Exceptions: Data is sent to the leader but not copied to the follower

If the leader fails to receive an ACK, the client can send a request to the leader again after the election. (2) Exception: Data reaches the leader node and is successfully replicated to all the follower nodes, but no response is received from the leader

After the re-election, although the data on the follower node is in the unsubmitted state, it remains consistent. Data submission can be completed after the re-election of the leader. Because the original leader is abnormal, the client cannot receive the ACK and will send a request to the leader again. Therefore, the server needs to implement the deduplication mechanism (3). Anomaly: the data reaches the leader node and is successfully copied to all or most of the followers nodes. The data is in the submitted state at the leader, but not in the unsubmitted state at the follower

At this stage, the leader hangs down and A new leader is elected. (4) Abnormal: Two leaders appear due to network “partition”. Take five server nodes as an example. Due to network reasons, nodes A and B cannot communicate with C, D and E. C, D and E re-elect and select the leader. If E wins, two leaders appear. After the network is restored, A and B will serve as followers of E

Raft determines which of the two diaries is updated by comparing the index of the last entry to the term number. If the two diaries have different term numbers, the larger term number is updated. If the term number is the same, the longer term number is updated. The RequestVote RPC contains the candidate’s diary information, and if the server’s own diary is newer than the candidate’s, it will refuse to vote for that candidate

Behind this behavior is the leader complete principle that if a diary entry is submitted during a given term, it must appear among all leaders with a larger term number. All log entries are only written from the leader node to the follower node, and the logs on the leader node are only added and never deleted or overwritten

Source: www.liangsonghua.me

Liang Songhua, senior engineer of JINGdong, focuses on stability assurance, agile development, JAVA advanced and micro-service architecture for a long time

Raft consensus algorithm for distributed systems

Related Posts

2021-01-07 Hot news

Ask Jack Ma: 6 most incisive questions

Master the four skills of text typesetting method in e-commerce Banner