In order to solve the consistency problem of distributed system, a large number of classical consistency protocols and algorithms have emerged in the process of long-term exploration and research, among which the most famous are two-phase commit protocol, three-phase commit protocol and Paxos algorithm.

2 PC with 3 PCS

In distributed system, although each machine node can clearly know whether it succeeds or fails in the process of transaction operation, it cannot directly obtain the operation result of other distributed nodes. Therefore, when a transaction needs to span multiple distributed nodes, in order to maintain the ACID nature of the transaction, a component called the “coordinator” needs to be introduced to uniformly schedule the execution logic of all distributed nodes, which are called “actors”. The coordinator is responsible for scheduling the behavior of the participants and ultimately deciding whether the participants will actually commit the transaction. Based on this idea, two phase commit and three phase commit protocols are derived.

2PC

2PC, short for two-Phase Commit, is an algorithm designed for computer networks, especially in the field of database, to ensure that all nodes based on distributed system architecture can maintain atomicity and consistency in the process of transaction processing. Generally, two-phase commit protocol is also considered as a strong consistency protocol to ensure data consistency in distributed systems.

2PC Execution process

The two-phase submission protocol splits the transaction submission process into two phases for processing, and its execution process is as follows:

Phase one: Submit the transaction request

1. Transaction inquiry

The coordinator sends transaction content to all participants, asks if a transaction commit can be performed, and waits for responses from each participant.

2. Execute transactions

Each participant node performs a transaction and writes Undo and Redo information to the transaction log.

3. Each participant feedback the response to the transaction query to the coordinator

If the participant successfully executes the transaction, the coordinator is given a Yes response indicating that the transaction can be executed. If the participant does not successfully execute the transaction, a No response is returned to the coordinator indicating that the transaction cannot execute.

Phase two: Perform the transaction commit

The coordinator will decide whether to commit the transaction based on the feedback of each participant. Normally, there are two possibilities:

Perform transaction commit: If the coordinator gets a Yes response from all participants, the transaction commit is performed.

1. Send the submit request

The coordinator issues Commit requests to all the participant nodes.

2. Transaction commit

After receiving the Commit request, the participant formally performs the transaction Commit and releases the transaction resources occupied during the entire transaction execution.

3. Feedback transaction submission results

The participant sends an Ack message to the coordinator after completing the transaction commit.

4. Complete the transaction

After receiving Ack messages from all participants, the coordinator completes the transaction.

Interrupt transaction: If any participant gives the coordinator a No response, or if the coordinator is unable to receive feedback from all participants after a timeout, the transaction is interrupted.

1. Send a rollback request

The coordinator issues a Rollback request to all the participant nodes.

2. Transaction rollback

After receiving the Rollback request, the participant uses the Undo information recorded in phase 1 to perform the transaction Rollback and, upon completion of the Rollback, releases the resources occupied during the entire transaction execution.

3. Feedback transaction rollback results

The participant sends an Ack message to the coordinator after completing the transaction rollback.

4. Interrupt transactions

The coordinator completes the transaction interrupt after receiving an Ack message from all participants.

To put it simply, two-phase commit divides the process of a transaction into two stages: voting and execution. The core of two-phase commit is the trial-and-commit method for each transaction. Therefore, two-phase commit can also be regarded as a strong consistency algorithm.

2PC problems

1. Synchronous blocking

During execution, all logic participating in the transaction is processed in a blocked state. That is, nodes can do nothing while waiting for messages from each other. In particular, when a node that has already occupied a resource is blocked waiting for a response message from another node, when a third node attempts to access the resource occupied by the node, the node will also be blocked.

2. Single point problems

Once the coordinator fails, the entire phase 2 commit process will not work, and even worse, if the coordinator fails during phase 2, the other participants will remain locked in transaction resources and will not be able to complete the transaction.

3. Inconsistent data

When a local network exception occurs after the coordinator sends a Commit request to all participants, or the coordinator crashes before completing the Commit request, only some participants receive the Commit request, and the data inconsistency occurs throughout the distributed system.

3. Too conservative

The failure of any node will lead to the failure of the whole transaction.

3PC

3PC, which stands for three-Phase Commit, is an improved version of 2PC. A transaction protocol consisting of CanCommit, PreCommit and doCommit phases.

3PC Execution process

Phase one: CanCommit

1. Transaction inquiry.

The coordinator sends a canCommit request containing the transaction content to all participants, asks if the transaction commit operation can be performed, and waits for the response from each participant.

2. Each participant feedback the response to the transaction query to the coordinator.

When a participant receives a canCommit request from the coordinator, it normally responds with a Yes response and goes into a preparatory state if it thinks it can execute the transaction successfully, and a No response otherwise.

Phase 2: PreCommit

The coordinator will decide whether to commit the transaction based on the feedback of each participant. Normally, there are two possibilities:

Perform transaction pre-commit; If the coordinator receives a Yes response from all participants, the transaction is pre-executed.

1. Send the pre-commit request

The coordinator issues preCommit requests to all participant nodes and enters the Prepared phase.

2. Transaction precommit

Upon receiving a preCommit request, the participant performs a transaction and records Undo and Redo information to the transaction log.

3. Each participant feeds back the response of transaction execution to the coordinator

If the participant successfully performs the transaction, the coordinator receives an Ack response while waiting for the final instruction: commit or abort.

Interrupts a transaction: If any participant gives the coordinator a No response, or if the coordinator does not receive the owner’s response after waiting for a timeout, the transaction is interrupted.

1. Send interrupt request

The coordinator makes abort requests to all participant nodes.

2. Interrupt transactions

The participant interrupts the transaction whether it receives an ABORT request from the coordinator or if a timeout occurs while waiting for the coordinator’s request.

Phase 3: doCommit

There are two possible scenarios where the actual transaction commit takes place at this stage.

commit

1. Send the submit request

Entering this phase, if the coordinator is in a healthy state and it receives Ack responses from all participants, it transitions from the “pre-commit” state to the “commit” state and sends a doCommit request to all participants.

2. Transaction commit

Upon receiving the doCommit request, the participant formally performs the transaction commit and, upon completion of the commit, releases the transaction resources occupied during the entire execution of the transaction.

3. Feedback transaction submission results

The participant sends an Ack message to the coordinator after completing the transaction commit.

4. Complete the transaction

After receiving Ack messages from all participants, the coordinator completes the transaction.

Interrupt the transaction

To enter this phase, it is assumed that the coordinator is in a normal working state and that any participant has returned a No response to the coordinator, or that the coordinator has not yet received feedback from all participants after a timeout.

1. Send interrupt request

The coordinator sends abort requests to all participant nodes.

2. Transaction rollback

Upon receiving the ABORT request, the participant uses the Undo information recorded in phase 2 to perform the transaction rollback and, upon completion of the rollback, releases the resources occupied during the entire execution of the transaction.

3. Feedback transaction rollback results

The participant sends an Ack message to the coordinator after the transaction is rolled.

4. Interrupt transactions

The coordinator interrupts the transaction after receiving an Ack message from all participants.

During the doCommit phase, if participants do not receive doCommit or ABORT requests from the coordinator in a timely manner, they continue to commit the transaction after waiting for a timeout. In phase 3, even though the participant does not receive a doCommit or ABORT request from the coordinator due to network timeout, the transaction is still committed.

3PC problems

There are two major differences between the three-phase commit protocol and the two-phase commit protocol:

  • Added an inquiry phase, which ensures early detection of actions that cannot be performed and need to be aborted, but it does not detect all such actions and only reduces the occurrence of such actions.

  • After the preparation stage, timeout is added to the tasks performed by both the coordinator and the participant. Once timeout occurs, both the coordinator and the participant continue to submit the transaction, which is regarded as success by default. This is also the correctness of default success after timeout according to probability and statistics.

Three phase commit protocol in removing obstruction also introduces a new problem, that is after the participants to receive preCommit news, if the network partition, the coordinator node and participants to normal network communication, in this case, the participants will still to be transaction submission, the inevitable inconsistencies of data.

Paxos algorithm

Paxos algorithm is a highly fault-tolerant consistency algorithm based on message passing. It is recognized as one of the most effective algorithms to solve distributed consistency problems.

Problem description

In ancient Greece, there was an island called Paxos, where laws were passed in the form of a council, and members of the council sent messages by messenger. It is worth noting that both MPS and couriers are part-time, they may leave the chamber at any time, and couriers may repeat their messages or never return. Therefore, the parliamentary agreement is to ensure that in this case the statute can still be made correctly, and there is no conflict.

The roles of councillors are first divided into Proposer, Acceptor, and Learner (which are allowed to hold more than one job). A Proposer presents a proposal with the number and value of the proposal. An Acceptor receives a proposal and accepts it. If a majority of Acceptors accept the proposal, the proposal is chosen. Learner can only “learn” the approved proposal. With roles divided, the problem can be defined more precisely:

  1. A value is approved only if it is submitted by a proposer (a resolution without approval is called a proposal;

  2. In an execution instance of the Paxos algorithm, only one value is chosen.

  3. Learners can only acquire values chosen.

The three semantics above can be evolved into the following constraints:

P1: An Acceptor must accept a proposal when it is first received.

P2: Once a proposal with value V is approved (chosen), subsequent proposals approved (chosen) must have value V.

P2a: Once a proposal with value V is chosen, any subsequent proposal accepted by acceptors must have value V.

P2b: Once a proposal with value V is chosen, then any subsequent proposals with a Proposer must have value V.

P2c: If a proposal numbered N has value V, then there is a majority, either none of them have accepted any proposal numbered less than n, or they have accepted all proposals numbered less than n with the proposal with the largest number having value V.

P1a: An Acceptor accepts a proposal numbered N if and only if it has not responded to a prepare request numbered greater than N.

Paxos algorithm content

The introduction and approval of resolutions

Stage 1 (Resolution submission)

1. The Proposer selects a proposal number M and sends a prepare request numbered M to more than half of the Acceptors.

2. If an Acceptor receives a PEPare request numbered M greater than the number of prepare requests it has responded to, it sends back to the Proposer the highest number it has approved as a response. The Acceptor also promises not to approve any proposal numbered less than M.

Stage 2 (Approval Stage)

1. If the Proposer receives a prepare request numbered M from more than half of the acceptors, it sends an accept request to the Acceptor for the proposal numbered M. Note that the value of V is the value of the highest-numbered proposal received in the response, or any value if the response contains no proposal.

2. If an Acceptor receives an Accept request for the proposal [M,V], it can Accept the proposal as long as the Acceptor has not yet responded to a prepare request with a number greater than M.

Acquisition of proposals

How to make Learner get the proposal, there are generally the following schemes:

  • Plan a

    Learner obtains a selected proposal only if the proposal has been approved by more than half of acceptors. Therefore, the easiest way is to send a proposal to all of the acceptors once the Acceptor approves it. This method allows Learner to receive the selected proposal as quickly as possible, but requires each Acceptor to communicate with all learners one by one, and the number of communication times is at least the product of both.

  • Scheme 2

    We can have all acceptors uniformly send their acceptances to a specific Learner (the “primary Learner”), assuming that the Learner can communicate with each other about the proposal selection through message communication. When the main Learner is notified that a proposal has been selected, it is responsible for notifying the other Learner.

    Scheme 2 requires an extra step to notify all Learners of the proposal, but the number of communication is greatly reduced, usually only the sum of acceptors and Learners. But at the same time, the scheme introduced a new instability factor; The main Learner can break down at any time.

  • Plan 3

    Extend the scope of the primary Learner, that is, acceptors can send approved proposals to a specific set of learners, each of which notifies all the other learners when a proposal is selected. The more there are in the Learner set, the better the reliability is, but at the same time the complexity of network communication is higher.

A principal Proposer is selected to ensure that the algorithm is active

Suppose there is an extreme case where two proposers present a sequence of numbered proposals with increasing numbers, but none of them are selected, and the process is as follows:

Proposer P1 presents a proposal numbered M1 and completes the phase ONE process. At the same time, however, an Acceptor has promised not to approve any proposals with a numbered M2 (M2>M1) after completing phase 1. Therefore, when P1 enters phase 2, its accept request will be ignored by acceptors. P1 then enters phase 2 and submits a proposal numbered M3 (M3>M2), which in turn causes P2’s accept request to be ignored in phase 2, and so on, the proposal selection process falls into an endless loop.

To ensure that the Paxos algorithm process is sustainable and avoids the “dead loop” described above, a master Proposer must be selected and only a master Proposer can produce a proposal. If a principal Proposer with a higher numbered proposal is submitted or is receiving approval, it will discard the lower numbered proposal and select a proposal with a large enough number, provided that the majority of acceptors are in communication with the primary Proposer. So if there are enough components in the system (including proposers, acceptors, and other network communications components) to work properly, the entire Paxos algorithm process can remain active by selecting a master Proposer.

The resources

Paxos algorithm – Wikipedia

Principles and practices of distributed consistency from Paxos to Zookeeper