Introduction of algorithm

Paxos is a messaging based consistency algorithm with high fault tolerance proposed by Leslie Lambert in 1990. According to Mike Burrows, the author of Google Chubby, there is only one consistency algorithm in the world, and that is Paxos. All other consistency algorithms are incomplete versions of Paxos algorithm, which is generally acknowledged to be obscure and difficult to implement in engineering. Well-known Paxos project implementations include Google Chubby, ZAB, wechat PhxPaxos, etc.

What problem is the Paxos algorithm suitable for? The Paxos algorithm is to solve the problem of how to reach an agreement on a decision in a distributed system.

Question of the Byzantine general

Byzantine is a place where located in Istanbul, Turkey now, is the capital of the Roman empire at the time, because at that time the Roman empire, the vast army are scattered in various places, far apart, the Byzantine capital had high resolution in order to make the war more cautious, all major war resolution must be agreed to by the most generals to implementation, However, between the capital and the generals, the generals can only rely on messengers, and messengers may be traitors or spies of the enemy, their messages are not reliable, and the resulting agreement or disagreement may not be the real feedback of the generals, and this is the famous Byzantine general problem.

Byzantine generals, is put forward by author Leslie – lambert Paxos algorithm of point to point communication the basic problem, to explain the meaning of the problem is, in the presence of message loss of unreliable channel tried to achieve consistency by means of the messaging is impossible, Paxos algorithm is the premise of the Byzantine generals, That is, the channel is secure and reliable, and the messages transmitted between cluster nodes will not be tampered with.

Paxos is based on two communication models that are commonly used between nodes in distributed systems: shared memory and messaging.

Algorithm description

Three roles

There are three roles in the Paxos algorithm, each with three different behaviors, but many times a single process may play multiple roles simultaneously.

  • ProposerProposals: (Proposal).
  • AcceptorWho voted on the proposal, whether or notacceptThe program, only more than halfAcceptorIf a proposal is accepted, it will be accepted.
  • Learners of the proposal: when the proposal is chosen, it carries out the proposal.

A Proposer may have more than one Acceptor, but it is also possible to have more than one Proposer in a cluster. If a Proposer produces a number of proposals, a consensus algorithm guarantees the following:

  • No proposal is submitted and no proposal is selected.
  • Each proposer first gets a have when making a proposalGlobally unique and increasingProposal No.NThat is, the unique number of stones in the whole clusterNAnd then change the number to give it to the proposal it wants to make.
  • Every voter in theacceptAfter a proposal, the proposal is numberedNRecords are kept locally, so that what is saved in each vote has beenacceptThere will be a proposal with the maximum number, whose number is assumed to bemaxN, each voter will onlyacceptThe number is greater than the local numbermaxNThe proposal.
  • Of all the proposals, only one can be selected.
  • Once a proposal is selected, other servers actively synchronize (Learn) The proposal goes local.

Description of algorithm process

The Paxos algorithm is executed in two stages: Prepare and Accept.

Prepare stage

  • A Proposer prepares to submit a proposal numbered N and first sends a prepare(N) request to all acceptors to determine whether the cluster supports the proposal numbered N.

  • 2. Each Acceptor stores the maximum number maxN of the prepare(N) proposal it accepted. When a Acceptor receives a prepare(N) request from another host, it compares the value of N with that of maxN.

    • If N is less than maxN, it indicates that the proposal is outdated, and the current voters reject the prepare request by not responding or responding to Error.

    • If N is greater than maxN, it indicates that the Proposal is acceptable. The current voter will first record the N and feedback the Proposal(myID, maxN, value) with the largest number accepted to the proposer to show the proposer their willingness to support the Proposal. The first parameter myid indicates the Acceptor’s id, the second parameter indicates the maximum number of proposals accepted by the Acceptor, and the third parameter indicates the value of the proposal. Of course, if the current Acceptor has not accepted any proposals, The Proposal(myID, NULL, NULL) is returned to the proposer.

    • In the prepare phase, N cannot be equal to maxN because of the generation mechanism of N. To obtain the value of N, it must be increased by one based on the original value.

The accept stage

  • 1. When a Proposer sends a prepare(N) and receives a majority of acceptors’ responses, the Proposer sends its actual Proposal(value) to all acceptors.

  • 2. After receiving a Proposal(N, value) sent by an Acceptor, an Acceptor compares N with the maxN and prepare numbers. If N is greater than or equal to the two proposals, the Acceptor accepts the Proposal(N, value) with the maxN and prepare numbers. The current voter accepts the proposal and gives feedback to the proposer. If N is less than these two numbers, the voters reject the proposal by means of no response or ERROR response.

  • 3. If the proposer does not receive accept feedback from more than half of the voters, the proposer will enter the prepare stage again and increase the number of proposal N to make a prepare request again. If the number of feedback received is more than half, the proposer will broadcast two kinds of information.

    • An “executable data synchronization message” is sent to the voters who have accepted their proposals, asking them to implement the proposals they have accepted.

    • The “proposal + Executable data synchronization signal” is sent to the voters who have not sent them accept feedback, that is, they receive the proposal and execute it immediately.

Examples are as follows:

1. Prepare phase

2. Accept stage

The live lock problem of Paxos algorithm

In the actual engineering application process, the Paxos algorithm mentioned above has a lot of inconvenience according to different actual needs, so there are a lot of algorithms to optimize the basic Paxos algorithm to improve the Paxos algorithm, such as Multi Paxos, Fast Paxos, EPaxos, etc.

Paxos algorithm has the problem of “live lock”, Fast Paxos algorithm improved Paxos algorithm: it only allows a process to process the write request, to solve the problem of live lock, ZAB protocol is an industrial implementation of Fast Paxos algorithm

Why does Paxos have a “live lock” problem? Paxos algorithm in each process can submit a proposal, but must be the only number N, access to a global gives the value of N to the proposal, in order to guarantee the uniqueness of N, the N value must be put into operation synchronization (exclusive) lock, lock the value of N is “resource” competition, if a process to submit a proposal, kept in N application resources, But each time it is not assigned, the process is in a “live lock” state.

Lock: Belongs to process between the execution of the state with the ready state in the face of the state, not blocked, live with a deadlock is living the underlying difference between lock locks as CPU use, because it can keep operation, is a regular task, FastPaxos algorithm only allow a process to submit a proposal, or N is not a process custom, that I was the only application of N, To me, not to any other process.

The Paxos algorithm has two applications

  • 1. Write request, that is, the Leader sends a write proposal to all the followers to vote on it. If more than half of the followers vote on it, the Leader accepts the write request.
  • 2. In the election of Leader, the election proposal is all self-recommendation at the beginning.

Common interview questions:

1. The work process in the Prepare and Accept stages is quite different. Why is the prepare process needed? , why didn’t the PaxOS algorithm directly go through the accept stage at the beginning of design?

  • 【1】 The Prepare phase is a “request for comment” phase in which all acceptors are asked if they agree to the proposal. The acceptors respond with an ACK to the Proposor with a globally unique number N
  • [2] Accept stage is the implementation stage of the proposal. Only when more than half of the ACK is received, all Learner proposals are synchronized. A proposal containing N is sent to all Learner in this stage
  • [3] These two stages have different functions and send different messages. If the two phases are combined into one phase, “opinion solicitation” cannot be carried out and ACK statistics cannot be collected

2. Why is the message sent by Prososer only a number N in the Preapare stage, while the message sent by Accept stage is a proposal? Can it be the other way around?

The number of the message sent in the prepare phase is N, and the amount of data is small. The proposal message sent in the Accept phase may contain a large amount of data. The Paxos algorithm sends messages with a small amount of data first because it may fail in the prepare phase. If the Paxos algorithm sends a large number of messages to solicit opinions but fails in the prepare phase, network resources such as bandwidth are wasted.

3. If the ZK client receives a lot of read requests, do I add followers or observers to the ZK cluster?

Observers should be added, because adding followers will prolong the voting time of write request proposals, that is, the execution efficiency of write request will be reduced because there are more voting places

4. The proposer sent itprepare(N)After that, all of themAcceptorHave put their own localmaxNWith the proposalNThe comparison was done, and the proposer received more than half of the responses, i.e.Proposal(myid, maxN, value)), why to sendProposal(N,value)We also need to letAcceptorCompare your own local onesmaxNwithNNow, is this redundant?

Some acceptors may have received a prepare(N) from other acceptors before the Proposal(N,value) was sent, or some acceptors may have synchronized a Proposal with a larger N to the local site. In this case, the maxN value must be larger. Prepare (N) is sent only after the first proposal has been executed. The second proposal is sent only after the first proposal is executed. It is possible that more than half of acceptors agree to a Proposal(N,value), but some do not agree to a Proposal(N,value).

5. Why was it never sentacceptFeedback from voters to send “proposal + executable data synchronization information”?

Since more than half of the voters have already sent accept feedback and agreed to the proposal before receiving some feedback from the voters in the Accept stage, there is no need to wait for other people to send accept feedback. Or some of the voters have not received the Proposal(N,value), more than half of the voters have already approved the Proposal. So, sending “proposal + Executable data synchronization messages” to voters who have not sent accept feedback is a way of preventing those voters from knowing what the proposal is.