In theoretical computer science, CAP theorem, also known as Brewer’s Theorem, states that it is impossible for a distributed computing system to satisfy three things at the same time:

  1. Consistence means that all nodes access the same copy of the latest data.
  2. Availability, which provides an error-free response for each request — but does not guarantee that the data retrieved is up-to-date;
  3. Network Partitioning, in practical effect, is equivalent to the time limit requirements for communication. If the system cannot achieve data consistency within the time limit, it means that A partitioning situation has occurred and that it must choose between C and A for the current operation.

Get on the bus

Talking about consistency today, there are two models for node communication in distributed systems: Shared memory and message passing.

In distributed systems based on the messaging communication model, the following errors are inevitable: processes may slow, be killed, or restart, and messages may be delayed, lost, or repeated. In the basic Paxos scenario, the possibility of message tampering, or Byzantine errors, is ignored. The Paxos algorithm solves the problem of how to reach agreement on a value in a distributed system where the above exceptions may occur, ensuring that no matter what happens, the consistency of the resolution will not be broken. A typical scenario is that in a distributed database system, if the initial state of the nodes is consistent and each node performs the same sequence of operations, they end up in a consistent state. To ensure that each node executes the same command sequence, a “consistency algorithm” needs to be executed on each instruction to ensure that each node sees the same instruction. A general consistency algorithm can be applied in many scenarios and is an important problem in distributed computing. Therefore, the research on consistency algorithm has not stopped since the 1980s.

Start (Paxos algorithm)

The Paxos algorithm determines a decision in two stages:

  • Phase1: Determine who has the highest number, only the one with the highest number has the right to submit a Proposal (Proposal: given a specific value);
  • Phase2: The proposer with the highest number submits a Proposal. If no other node offers a Proposal with a higher number, the Proposal will be approved. Otherwise, the whole process will start again.

The conclusion is this conclusion, and I won’t go into the derivation of the whole process here. One thing to be aware of, however, is that live locks may occur during the first phase of the process. You’re higher, I’m higher, and the algorithm never ends. A “Leader” can be used to solve problems. The Leader is not deliberately chosen, but naturally formed. Again, there is no discussion, this article is mainly based on Code!

Phase1

func (px *Paxos)Prepare(args *PrepareArgs, reply *PrepareReply) error { px.mu.Lock() defer px.mu.Unlock() round, exist := px.rounds[args.Seq] if ! exist { //new seq of commit,so need new px.rounds[args.Seq] = px.newInstance() round, _ = px.rounds[args.Seq] reply.Err = OK }else { if args.PNum > round.proposeNumber { reply.Err = OK }else { reply.Err = Reject } } if reply.Err == OK { reply.AcceptPnum = round.acceptorNumber reply.AcceptValue = round.acceptValue px.rounds[args.Seq].proposeNumber = args.PNum }else { //reject } return nil }Copy the code

In the Prepare stage, each machine is asked whether the present proposal can be accepted or not through RPC call. The judgment condition is that the number of the present proposal is greater than that of the Prepare of other machines. If args.PNum > round. ProposeNumber If none of the previous machines submitted Prepare, the test was approved, even if the first machine submitted Prepare. Code snippet:

round, exist := px.rounds[args.Seq] if ! exist { //new seq of commit,so need new px.rounds[args.Seq] = px.newInstance() round, _ = px.rounds[args.Seq] reply.Err = OK }Copy the code

After the logical judgment has been made, if the proposal is approved, the proposed and determined values are returned to the proposer. Code snippet:

if reply.Err == OK {
    reply.AcceptPnum = round.acceptorNumber
    reply.AcceptValue = round.acceptValue
    px.rounds[args.Seq].proposeNumber = args.PNum
}Copy the code

Phase2

func (px Paxos)Accept(args *AcceptArgs, reply *AcceptReply) error { px.mu.Lock() defer px.mu.Unlock() round, exist := px.rounds[args.Seq] if ! exist { px.rounds[args.Seq] = px.newInstance() reply.Err = OK }else { if args.PNum >= round.proposeNumber { reply.Err = OK }else { reply.Err = Reject } } if reply.Err == OK { px.rounds[args.Seq].acceptorNumber = args.PNum px.rounds[args.Seq].proposeNumber = args.PNum px.rounds[args.Seq].acceptValue = args.Value }else { //reject } return nil }Copy the code

The Accept phase is basically the same as the Prepare phase. Check if the current proposal exists, if not, return OK.

round, exist := px.rounds[args.Seq] if ! exist { px.rounds[args.Seq] = px.newInstance() reply.Err = OK }Copy the code

Then check whether the offer number is greater than or equal to the current offer number. If so, return OK. If not, reject.

if args.PNum >= round.proposeNumber {
    reply.Err = OK
}else {
    reply.Err = Reject
}Copy the code

It is also important to set the offer number and the offer value for the round if the offer is accepted.

if reply.Err == OK {
    px.rounds[args.Seq].acceptorNumber = args.PNum
    px.rounds[args.Seq].proposeNumber = args.PNum
    px.rounds[args.Seq].acceptValue = args.Value
}Copy the code

In the whole process, Map and array are used to store some auxiliary information. Map mainly stores the confirmed results of each Round of voting, Key represents the voting number of each Round, and Round represents the accepted values. The ease-of-use array is mainly used to store the smallest number of Completes that have been determined in the course of use.

rounds map[int]*Round //cache each round paxos result key is seq value is value completes [] int //maintain peer min seq  of completed func (px *Paxos)Decide(args *DecideArgs, reply *DecideReply) error { px.mu.Lock() defer px.mu.Unlock() _, exist := px.rounds[args.Seq] if ! exist { px.rounds[args.Seq] = px.newInstance() } px.rounds[args.Seq].acceptorNumber = args.PNum px.rounds[args.Seq].acceptValue = args.Value px.rounds[args.Seq].proposeNumber = args.PNum px.rounds[args.Seq].state = Decided px.completes[args.Me] = args.Done return nil }Copy the code

The Decide method is also used by the proposer to determine a value that maps to the state machine in the distributed application.

The client submits instructions to the server, which is installed on multiple machines through the Paxos algorithm. All servers execute the same instructions sequentially, and then the state machine executes the instructions. Finally, the result of each machine is the same.

Arrive station

In a distributed environment, network faults and outages are normal. If a machine goes down and then recovers after a period of time, how can it recover its previous instructions during the downtime? When he submits a JMP directive, indexes 1 and 2 are already identified, so he can start directly from index 3. When he submits Propser (JMP), he receives the return values of S1 and S3 (CMP). According to Paxos algorithm, the latter agrees with the former principle. So he submits a request with a value of CMP Accept during Phase2, and the index 3 becomes CMP. If no value is returned at this stage, the client returns the value, and an agreement is reached.

From MIT, then used for their own learning, source code comments address.