preface

There is only one consistency algorithm in the world, Paxos. Said by a Google god. Paxos is also notoriously difficult to understand, and the reasoning process is extremely complex.

Paxos is a bit like 2PC and 3PC, but it solves the problems of both algorithms. This algorithm has been practiced in many big factories, such as The distributed database of OceanBase of Ali, and the Paxos algorithm is used at the bottom. For example, Google’s Chubby distributed lock also uses this algorithm. This algorithm can be seen in distributed system status, even, Paxos is synonymous with distributed consistency.

The body of the

1. What is the Paxos algorithm

Paxos algorithm is a consistency algorithm based on message passing and has high fault tolerance. It is recognized as one of the most effective algorithms to solve distributed consistency problems.

2. Paxos algorithm generates background

2.1. Question of the Byzantine generals

Byzantium was the capital of the ancient Eastern Roman Empire, and because it was so large, multiple generals (nodes in the system) guarding the borders needed to send messages through messengers and reach certain agreed decisions. But since there may be traitors in the messenger (nodes in the system fail), these traitors will try to send different messages to different generals in an attempt to interfere with consistency.

2.2. Origin of Paxos algorithm

The background of the story is that a number of judges on the ancient Greek island of Paxos voted on a motion in a hall, and how to reach a unified result. Notes were passed between them by attendants, but the judge might leave or enter the hall, and the attendants might slack off and go to sleep.

2.3 Background

In a common distributed system, node downtime or network anomalies (including message repetition, loss, delay, disorder, and network partition) always occur.

Paxos algorithm is mainly to solve how to quickly and correctly reach a consensus on a certain value in the cluster and ensure the consistency of the whole system in a distributed system where the above faults occur.

3. Algorithm details

3.1 Roles & Proposals

The Proposal (Proposal)

Note: The scope of the proposal >value. As discussed later, [Proposal = number + value]. [M,V]. Tentative in the following description: Proposal =P, Value=V.

role

Proposer: a Proposer can prepare a Proposal.
Accecptor: Acceptors accept proposals. Once the proposal is accepted, the value in the proposal is selected.
Learner: The Acceptor tells Learner which proposal is selected, and then Learner learns the selected value.

In a concrete implementation, a process may be a Proposer, an Acceptor, or a Learner.

3.2. Problem description

The core of the Paxos algorithm is consistency. So I will explain how the algorithm solves practical problems from the description of consistency problems.

3.2.1. Preconditions of the consistency algorithm

In the proposedPThere is only oneVIs selected.
If there is noPIs put forward, there is noVIs selected.
inPOnce selected, any process can learn from the selectedP.

3.2.2. Different roles communicate by sending messages

Each role executes at any speed and can be stopped by errors or restarted. After a value is selected, all roles may fail and restart, and unless those roles that failed to restart can record some information, they cannot determine the selected value when they restart.
Messages can be delayed for any length of time during delivery, can be repeated, and can be lost, but messages are not corrupted.

3.3. Derivation process

3.3.1. Only one Acceptor exists

If an Acceptor accepts a P, only one V is selected.

Problem: If this Acceptor fails, the entire system service is unavailable.

3.3.2 rainfall distribution on 10-12. Multiple Acceptor

Q: How do I select a value with a number of proposers and acceptors?

The instructions are divided into two stages: contract P1 and contract P2.

3.3.2.1. Agreed P1

P1: An Acceptor must accept the first P it receives.

If each Proposer produces a different P, then a number of proposers must produce multiple PS sent to multiple acceptors. According to the contract P1, each Acceptor accepts a P, which results in a different V being selected, as shown in the following figure:

As shown above, P1 causes problems: v1, V2, and V3 are not selected because they are only accepted by one Acceptor.

For the above, we need an additional convention:

P1a: A proposal P is selected and must be accepted by more than half of acceptors.

For P1a, this means that an Acceptor must accept more than one proposal.

Obviously, this contradicts P1, so the proposal needs to be redesigned. The original design was: [proposal P = value], now redesign [proposal P = proposal number + value], can be expressed as [M, V].

New question: If multiple proposals are selected, how can we ensure that the selected proposal P has the same value?

The P2 3.3.2.2. Agreed

P2: If proposal P[M0,V0] is selected, then all selected P with a higher number than M0 will also have the value of V0.

For “selected” in P2: to be selected, a proposal must first be approved by at least one Acceptor. Therefore, P2 can be understood as:

P2a: If the proposal P[M0,V0] is selected, then the value of all P numbers higher than M0 that are approved by acceptors will also be V0.

As long as it satisfies P2a, it satisfies P2. The problem of multiple proposals being selected has been resolved, but new problems may arise due to network instability or downtime (which is inevitable) :

Suppose there are five acceptors. If a Proposer2 proposal [M1,V1] is accepted by acceptors 2-5 (more than half), then V1 is considered to have been selected for both acceptors 2-5 and Proposer2. Proposer1 sends an Acceptor1 proposal for [M2,V2] (V2≠V1 and M2>M1) just after an Acceptor1 has recovered from an outage. This is the first proposal received by Acceptor1. According to P1 (an Acceptor must accept the first proposal it receives), an Acceptor must accept the proposal. Acceptor1 also considers V2 to be selected.

This raises two questions:

Acceptor1 considers V2 selected, and acceptor2-5 and Proposer2 consider V1 selected. There is an inconsistency.
V1 is selected, but the higher number of the proposal [M2,V2] accepted by Acceptor1 has value V2, and V2≠V1. This contradicts P2a (if a proposal with value v is selected, then any proposal with a higher number that is accepted by acceptors must also have value V).

Based on the above questions, there is P2b:

P2b: If P[M0,V0] is selected, any Proposer produces P with a value of V0.

For any Proposer described in P2b, how do you guarantee that the value of P produced by any Proposer is also V0? As long as it satisfies P2c:

P2c: for any M,V, if [M,V] is proposed, there exists a group S consisting of more than half of acceptors that meets either of the following two conditions: ① None of S has accepted a proposal numbered less than M. (2) The value of the proposal with the maximum number accepted by an Acceptor in S is V.

Derivation done…

3.4. Algorithm flow

3.4.1. Proposer makes a proposal

The general idea is as follows:

A. Prepare B. Prepare C. Prepare D. Prepare

A Proposer selects a new proposal. Send a request to half or more of the Acceptor set S, asking each of the acceptors in S to respond as follows:

If an Acceptor has not accepted a proposal, it promises to a Proposer no more proposals numbered less than N.
If an Acceptor has accepted the request, it returns to the Proposer the largest numbered proposal with a number less than N that it has accepted.

(2). Acceptance stage: Acceptor request

If the Proposer receives more than half of the Acceptor responses, it generates a proposal numbered N with value V, where V is the value of the proposal with the largest number among the responses.
If no Proposer receives a response with a value generated by the Proposer, it sends that proposal to S with the expectation that the Acceptor will accept it.

3.4.2. Acceptor accepts the proposal

Acceptors can ignore any requests, including Prepare and Accept, without compromising the security of the algorithm. Therefore, we will discuss when acceptors can respond to a request.

The following constraints apply to Acceptor acceptances:

P1b: An Acceptor may accept a Prepare request numbered N as long as it has not responded to any Prepare request numbered greater than N.

If an Acceptor receives a Prepare request with number N, it has previously responded to a Prepare request with number greater than N. According to P1b, this Acceptor cannot accept a proposal numbered N. Therefore, the Acceptor can ignore the Prepare request with number N. Of course, it can also respond with an error to let a Proposer know as early as possible that its proposal will not be accepted.

Therefore, an Acceptor simply needs to remember:

The proposal with the largest number accepted;
The maximum number of requests that have been responded to.

4. Paxos algorithm description

5. Are you learning

There are three ways to learn the selected value:

6. How to ensure the activity of Paxos algorithm

summary

Paxos can ensure data consistency when nodes are recovered after downtime, messages are disordered or lost, and networks are fragmented. However, the description of Paxos focuses on theory. In the application of practical projects, it may become another algorithm after processing more than N practical details. At this time, the correctness can no longer be guaranteed by theory.

Proving the correctness of distributed consistency algorithms is often more difficult than implementing them. Therefore, many systems in practice are based on the Paxos theory derived from the variant and simplified version. Examples include Systems like Google’s Chubby, MegaStore, And Spanner, ZooKeeper’s ZAB protocol, and Raft protocol, which is much easier to understand.

Most systems are run in practice for a long time, after verification found that the system can basically run, no major problems can be found on the production environment.

A link to the

Distributed theory (I) – CAP theorem
Distributed theory (II) – BASE theory
Distributed theory (III) – 2PC protocol
Distributed theory (IV) – 3PC protocol
Distributed theory (v) – Consistency algorithm Paxos
Distributed theory (VI) – Consistency protocol Raft

Welcome to the public number: Zero one Technology Stack

This account will continue to share learning materials and articles on back-end technologies, including virtual machine basics, multithreaded programming, high-performance frameworks, asynchronous, caching and messaging middleware, distributed and microservices, architecture learning and progression.

Distributed theory (v) – Consistency algorithm Paxos