Distributed algorithm – Paxos

Paxos is an algorithm that can reliably and conclusively achieve consensus consistency over a whole bunch of completely unreliable networks. In other words, it allows a set of processors (servers) that are not necessarily reliable to reach a certain security consensus if certain conditions are met, and to keep the set of processors (servers) consistent if the conditions are not. Paxos algorithm is divided into two parts: core algorithm and complete algorithm. The core part of Paxos algorithm solves a very important basic problem in distributed domain, that is, consensus problem. The complete algorithm is the algorithm used to implement the state machine.

Consensus problem

Loosely speaking, the consensus problem is that even if multiple processes agree on a value, each process can propose a value it wants, but only one value will be selected in the end, and all processes agree on the selected value.

Note the consensus algorithm

Consensus algorithms have a few caveats:

  • It can be any number of processes
  • All processes can propose a value
  • All processes can learn the selected value

Consensus algorithm requirements

Consensus algorithm has two requirements: safety and survival

Safety requirements refer to:

  • Only one of the proposed values may be selected
  • Only one value is selected
  • A process can learn a value only if it has actually been selected

The survival requirement is that a proposed value must eventually be selected, and that if a value is selected, a process can eventually learn that value.

Paxos consensus algorithm

The role of algorithms

The Paxos consensus algorithm has three roles, namely a proposer, a acceptor, and a learner. These are logical roles that perform different functions of the algorithm, and a process can hold multiple roles:

  • The proposer role is responsible for proposing a value
  • The acceptor role is responsible for selecting a value
  • The Learner role is responsible for learning the selected value

The process of selecting and learning values of an algorithm

Consensus algorithm is divided into two processes, namely, selecting a value and learning a value, as follows:

  • In selecting a value, each proposer from a set of proposers may propose a single value to a set of acceptors, which then selects a unique value from among the proposals.
  • In learning a value, a learner learns the selected value from an acceptor

Select value specific process

The value selection process is a two-stage submission process in which the first stage selects the proposal number and the second stage accepts the proposed value.

  • 1.1: A proposer generates a new proposal number n that is greater than the proposal number last used. And send a prepare(n) message to most acceptors
  • 1.2: If an acceptor receives a proposal message and n is greater than the locally committed proposal number, it updates the locally committed proposal number with n. If any other proposal has been accepted, the proposal number and value are returned. If no proposal number and NULL are returned.
  • 2.1: If a proposer receives a promise numbered N from a majority of acceptors, it constructs a new proposal numbered N with a proposal value v. V is determined according to the following rules:
    • If there are proposals in all of these promise messages, the value of the proposal with the largest offer number is used
    • If no proposal is carried in any of these promise messages, then a proposer can use the values it proposes. This proposer sends an acknowledgement message to each of these acceptors
  • 2.2: If an acceptor receives an acceptance message with n greater than the locally committed number, it accepts the proposed value and stores it persistently.

Illustrated process:

An acceptor3, an acceptor4, or an acceptor5 accept an offer from (2, X). The five acceptors accepted different proposals (1, X) and (2, X). But the values are the same, that is, there is a consensus on the values, and the proposal numbers are not required to be the same.

Learning value specific process

The learning value is similar to the selection value, which is also a second-order submission process:

3.1 When an acceptor accepts a proposal, notify all users of the proposal

3.2 If Learner receives an acceptor notification, he accepts the offer. If a learner accepts a proposal received from most acceptors, the learner accepts the value in the proposal.

Illustrated process:

As shown in the figure, the number of Learner messages is the product of the number of acceptors and the number of Learner messages. To reduce the number of learner messages, designate a Learner leader. Sends the Learn message to the learner’s leader. After receiving most of the messages, the leader accepts the value in the request and sends the Learn message to other learner.

Summary of consensus algorithm

In the process of integrating the previous selection value and learning value, the consensus algorithm has the following steps:

  • Proposer initiates a proposal number to a majority of acceptors.
  • Acceptors accept a proposal number based on a rule (maximum number) and respond to a proposer
  • A proposer receives a majority of acceptors’ responses and sends an acknowledgement message to a value it selects based on a rule (if there is a promised value).
  • The acceptor accepts the value in the acknowledgement message and sends the Learn request to the Learner leader
  • After receiving most of the learning requests, the Learn leader sends the learning requests to other learners and synchronizes the value.

At this point, the values proposed by the proposer are synchronized to all learns. The consensus algorithm ends. To sum up, paXOS consensus algorithm can be simply understood as a three-stage submission process

  • Stage 1: Confirm that most nodes accept the proposal number
  • Phase 2: Submit the proposal and synchronize the value to the majority of acceptors
  • Phase 3: Synchronize values to all Learn nodes

Paxos complete algorithm

The Paxos consensus algorithm can only form a resolution for one value, and the formation of the resolution requires at least two network round-trips. In the case of high concurrency, more network round-trips may be required, and even live locks may be formed in extreme cases. If you want to determine multiple values consecutively, the Paxos consensus algorithm fails. Therefore, Paxos consensus algorithm is almost only used for theoretical research, not directly applied in practical engineering.

Almost all practical applications require continuous determination of multiple values, and it is expected to be more efficient. Multi-paxos was proposed to solve this problem. Multi-paxos makes two improvements based on Paxos consensus algorithm:

  1. For each value to be determined, run an Instance of the Paxos algorithm to form a resolution. Each Paxos Instance is identified by a unique Instance ID.
  2. A Leader is elected from among the Proposers, and the Leader uniquely submits a Proposal to the Acceptors for a vote. With no Proposer competition, it resolves the live lock problem. If only one Leader submits Value in the system, the Prepare phase can be skipped. In this way, the two phases are changed into one phase to improve efficiency.

Paxos complete algorithm — Multi Paxos is an algorithm that runs Paxos consensus algorithm many times to form multiple instances. Similarly, each process forms an array in which each element has an agreed value.

Exception handling

Split brain problems

The Paxos consensus algorithm ensures that even if multiple processes consider themselves to be the leader, i.e. in the case of split brain, only one value can be selected for each instance — each leader proposes a value with a proposal ID, while learn accepts only one proposal ID at a time.

Leader 1 May not succeed every time, and it may fail. When process 1 becomes the leader and the proposal passes a value that has not been fully learned by process 2, process 3 also considers itself the leader and makes process 2 accept its proposal (with a larger proposal number). As a result, the content synchronized by process 1 to process 2 becomes invalid.

Hole processing

Instances can be executed concurrently, which can cause a problem: if there is a message loss or network delay during the validation of the value of the first instance, the first value is not determined. At the same time, the leader starts the second instance, and the value of the second instance is successfully determined. The value of that first instance is null and the value of the second instance is determined. This situation is called void.

If the Leader fathian does not get a reply to a message within a certain period of time, the leader can resend the message. Through a retransmission mechanism, the void is eventually filled.

If the leader switch occurs at this point, the situation will be different. If the old leader’s offer has been accepted, the new leader keeps the offer. The new leader resends the same message. If the old leader’s offer has not been accepted, the new leader can propose a new value. One way or another, the hole will eventually be filled.

Paxos algorithm application

Atomic broadcast

Atomic broadcast protocols are used to broadcast messages to broadcast objects and ensure that messages are reliably received and that all broadcast objects receive them in the same order.

Atomic broadcast protocols are usually defined to contain the following two actions:

  • ABroadcast(v) : Broadcast action, called when the client wants to broadcast the value v.
  • V =ADeliver() : Post action by which clients receive values submitted by other clients. This action is usually a callback that the client will call back when a value is to be accepted.

Atomic broadcast features – Atomic broadcast guarantees that if a process invokes a broadcast action, all client-side post actions will be called back in the same order as the broadcast was called.

Paxos algorithm and atomic broadcasting

In an implementation of an atomic broadcast based on the Paxos algorithm, an atomic broadcast black box consists of a set of processes with a proposer and acceptor roles inside, and a client receiving a broadcast callback as a learner role.

When a client calls the ABroadcast(v) action, the request is given to the proposer as a proposal, and the value is synchronized to all of the respondents after a majority of acceptors agree to the request. When learner receives this value, he completes the callback to the aDeliver() action interface.

conclusion

Paxos is further divided into two algorithms, Basic Paxos (consensus algorithm) and Multi Paxos (full algorithm). Consensus algorithm includes two specific processes, selection value and learning value, and the specific steps can be summarized into three stages:

  • Stage 1: Confirm that most nodes accept the proposal number
  • Phase 2: Submit the proposal and synchronize the value to the majority of acceptors
  • Phase 3: Synchronize values to all Learn nodes

The complete algorithm is to run Paxos consensus algorithm for many times to solve the application problem of consensus algorithm in engineering. The leader in the complete algorithm also reaches agreement through consensus algorithm.