This article is first posted on the official account: “Miscellaneous Notes of Wooden Birds”

primers

Paxos is an inescapable algorithm in distributed systems, but notoriously difficult to understand. So I see Paxos is also always around, but I always feel a little regret. So in the past week, I collected a lot of information and tried many ways to open the door. Then taking advantage of the fresh, the understanding of the brain to the paper, do a summary, in order to prepare for a rainy day.

Leslie Lamport, the inventor of the Paxos algorithm, is one of the founders of distributed systems. There are many anecdotes, as can be seen from the name of Paxos: Paxos is an ancient Greek city-state that Lamport invented in order to introduce the consensus problem of distributed systems. After The first related paper, The Part-time Parliament, was published in 1998, many people said that they could not understand. Therefore, In 2001, Lamport re-elaborated the main idea of Paxos Made Simple by using relatively concise language and logic, thus creating Paxos Made Simple.

The abstract of Lamport’s Paxos Made Simple paper is a single sentence:

The Paxos algorithm, when presented in plain English, is very simple

However, I don’t understand this simple.

Author: Greenwood Birdwww.qtmuniao.com/2021/06/14/…Please indicate the source

modeling

In my theory, if you can’t understand a simple thing, it must be the wrong way to open it (model it). Since the author says the theory is simple, if I can find a solution that fits my context, I will know what he is doing.

So I searched a lot of information, after watching the zhixing Society – PaxOS and distributed system video, the insight. This video used the client-server + lock model familiar to programmers to explain some concepts and constraints of Paxos step by step, which made me have a preliminary grasp of the problems and ideas to be solved by Paxos emotionally. Then I read the paper, which was very convenient to understand.

So how did Zhixing Society dismantle the consensus algorithm? According to my understanding of the paper, I will briefly sort out the content of the video. Some contents have been changed. If you haven’t seen the video, I highly recommend checking it out first.

Note: This paper does not involve Learner, nor does it make a detailed interpretation of the original paper, nor does it discuss the engineering of Paxos. I thought I’d write them in one article, but it turned out to be too long, so I broke them up into a series.

The basic concept

As a programmer trying to understand Paxos, two questions must arise: What problem is Paxos designed to solve? How is Paxos applied to distributed system in engineering?

The Paxos algorithm is used to determine the value of an immutable variable. Immutable, single value, not very useful for distributed systems. But we build a distributed system that behaves like a single machine through a bridge called pre-write logging (WAL), or operation logging.

On the one hand, the operation log can be viewed as a sequence of immutable operation records. For immutable single operation records, multi-machine consensus is the problem that Paxos solves. With a little extension, it is possible to allow multiple machines to agree on a sequence of determined operations. On the other hand, if we have a globally unique sequence of operations, each copy can execute the sequence of operations in the same order, building the same state machine, which is expressive and can solve a broad class of system problems.

abstract

Now, back to understanding the Paxos algorithm itself. The video firstly abstracts a practical engineering problem and gradually presents a better solution to dismantle the Paxos algorithm constraints.

Problem description: Design a system to store variables named var.

System Role:

  1. The system consists of multiple acceptors that store and manage var variables.
  2. A Proposer calls the system API concurrently and submits different var values to the system.

System API: Withdraw (var, V) →

or

where f is the value saved by var in Acceptor.

,>

System requirements:

  1. Once the value of var is determined, it cannot be changed, and it can be read all the way to that value
  2. Can tolerate any Proposer failure
  3. Failure of less than half of acceptors is tolerated

Just like the algorithm problem, you can simplify the problem and get the basic idea. Then the solution of the original problem is obtained by generalization step by step.

Plan a

Assume that the system consists of a single Acceptor.

To handle a number of concurrent calls to a Proposer, the simplest way to do this is to use a mutex on an Acceptor and withdraw the Proposer in two phases:

Prepare phase:

A Proposer obtains an Acceptor lock and the current var value f from an Acceptor::prepare. Abort if the lock is found to be occupied.

The Accept phase:

If f is null, then accept(var, V) the data V from that Proposer.

If f is not null, the lock is released by Acceptor::release().

Problems with this scheme: no Proposer machine failures are tolerated. If a Proposer calls an Acceptor::prepare and hangs, it holds the lock and makes an Acceptor unusable.

Scheme 2

Solve the Proposer problem.

If a Proposer fails with a number of proposers, a single one is inevitable. Now that we can’t decide whether a Proposer lives or dies, we have to work on locks. Such as introducing timeouts to locks, or making locks preemptable. For the former, the timeout threshold is not easy to control: too long will not perform well, too short will lead to frequent retries. The latter is better, invalidating the original lock only with a new Proposer request.

Let’s expand on the preemption lock design.

Preemption necessarily introduces priority issues: a highest-priority Proposer can preempt a lock with a lower-priority Proposer. Each Proposer that locks a number n with a large number has a high priority. Epoch is not referred to in the video. It’s called n in the paper, but it means the same thing. The number can be allocated by a global transmitter to ensure monotonic increase; You can also use time stamps directly, but there are synchronization issues with stamps across multiple machines.

Proposer2 preempts the Acceptor lock from Proposer1. If the Acceptor value var is set, can Proposer2 change it? I do not think this is a problem in a single Acceptor, as long as the var of an Acceptor is set and cannot be modified at any time.

In much the same way as in plan 1, Acceptor only needs to save one more state: the number of the Proposer currently granting a lock is latest-n. And in both phases, the numbered numbered Proposer is first compared to the numbered numbered latest-n that is currently saved to determine whether the Proposer is rejected or accepted. In addition, there is no need to call the interface to explicitly release the lock each time.

Problems in this solution: The system cannot provide services when a single Acceptor breaks down.

Plan 3

Introduce multiple acceptors based on scheme 2.

It is assumed that the number of Acceptor clusters is fixed. In this case, there are several problems when extending scheme 2:

  1. How does an Acceptor cluster determine a value? The var of half acceptors is set to the same value. Multiple acceptors are required for a single Acceptor.
  2. If the prepare phase detects that some acceptors have values, does the Proposer release the lock directly? Not necessarily, because there are multiple acceptors and more than half of them accept the same value. So if the number of values is less than half, the Proposer needs to proceed to the accept phase, but at that point it needs to decide whether to select a value from a random set of values taken during the phase, or from a set of values according to some rule. For fast convergence, we choose the value with the largest label.
  3. A Proposer obtains a lock on an Acceptor cluster during the prepare phase. Obtain more than half of Acceptor locks. A numbered n contains at most one Proposer that obtains a lock according to the pigeon nest principle.

With the above problems solved, the final solution is ready:

The Proposer:

Prepare phase: The system obtains the prepare(n) label and sends a prepare(n) request to the Acceptor cluster. If the Acceptor cluster receives no response, the system terminates. When receiving a half-OK reply, if there is a non-null value in the reply, select the value v with the largest label. If no value exists in the reply, any value v can be selected to initiate an Accept request.

Accept phase: An Accept (n, v) request is sent to the Acceptor cluster using the value v selected in the previous phase. If more than half of the OK packets are received, the cluster successfully accepts V. If not, it indicates that a Proposer with a higher numbered numbered preempted the Proposer or that some acceptors failed.

Note: The two phases do not require one request to all acceptors in the cluster, just select a half set. And the set of acceptors selected for phase one and phase two need not be the same.

The Acceptor:

The status to be maintained is as follows: the current Accept value, its accepted number

, and the maximum number of the current authorized lock: latest-n.
,>

Prepare phase: Receives a prepare(n) request for the Proposer. If latest-n > n, an Error is returned. Otherwise, return

. Update latest-n to n.
,>

Accept phase: receives a Proposer accept(n, v) request with an Error if latest-n > n. Otherwise accept the request, update latest-n, accepted-n, accepted-v, and return OK.

summary

After watching the video of Zhixing Society and combing the above information, I will have a better understanding of Paxos Made Simple paper. And then I thought, why is it so hard to read the paper? On the one hand, the paper does not have much preparation, we start to reason, and we do not have a suitable model in mind to understand the various concepts mentioned in the paper; A paper, on the other hand, is a process of reverse organization, that is, gradually deducing the conditions that need to be met from the conclusion, and finally combining all the conditions. All these make it difficult to pick up the paper directly.

Finally, summarize the key points of understanding Paxos again:

  1. Understanding the purpose of the original Paxos is for multiple acceptors to agree on a single immutable value.
  2. Use the Client-Server + lock model in engineering to help understand.
  3. Splitting an algorithm into two phases can fail quickly.
  4. The n label was introduced to solve deadlocks and preemption order problems.
  5. In the second stage, the value of the maximum label is selected to make the accept process converge rapidly.

The resources

  1. Paper: www.scs.stanford.edu/20sp-cs244b…
  2. Translation: www.cnblogs.com/yaodd/p/615…
  3. Macmillan society – paxos and distributed system: www.bilibili.com/video/BV1Lt…

I am Green tengmu bird, a programmer like photography, welcome to pay attention to my public number: “Mu Bird miscellaneous Record”.