Thoughts on distributed transaction, two-stage commit protocol and three-stage commit protocol

Distributed transactions

Distributed transactions are transactions that involve operating on multiple databases. This is an extension of the concept of a transaction for the same library to a transaction for multiple libraries. The goal is to ensure data consistency in distributed systems. The key to distributed transaction processing is that there must be a way to know everything a transaction is doing anywhere, and the decision to commit or roll back a transaction must produce uniform results (all commit or all rollback)

In a distributed system, each node is physically independent of each other and communicates and coordinates through the network. Because of the transaction mechanism, data operations on each individual node are guaranteed to meet ACID. However, independent nodes cannot know exactly how transactions are executed in other nodes. So in theory, the two machines can’t theoretically reach a consistent state. If you want data consistency across multiple machines in a distributed deployment, you need to ensure that all data writes are performed on all nodes, or none at all. However, one machine executing a local transaction cannot know the result of a local transaction executed on another machine. Therefore, he does not know whether the transaction should be committed or roolback. Therefore, the conventional solution is to introduce a “coordinator” component to uniformly schedule execution of all distributed nodes.

Second, distributed consistency

In distributed systems, in order to ensure high availability of data, multiple replicas of data are usually retained. These replicas are placed on different physical machines. Therefore, the question arises, how do you ensure the consistency of data between these different copies?

1. What is data consistency

As the name implies, data consistency refers to the operation of adding, deleting, or modifying data on different copies that either succeeds or fails on different copies and changes the copies from one state to another, with all copies in the same state. If a failure occurs in this process and some transactions are interrupted before completion, the unfinished transactions are in an incorrect state, or an inconsistent state, because some of their modifications to the replica have already changed the replica. Consistency and atomicity are closely related.

2. The CAP theorem

CAP theorem, also known as Breuer’s theorem, states that for a distributed system, the following three points cannot be met simultaneously:

Consisteny

The requirement of consistency is that any client (the Actor shown in the following figure) can obtain the latest data for each read operation. That is, after A client writes new data to node A, the data obtained by other clients from node B must be the latest and consistent with node A’s data.
Availability

The requirement of availability is that each request receives the expected response in a reasonable amount of time (there is no guarantee that the results obtained are up to date). As shown in the following figure, the client sends A request to node A or node B, and as long as these two nodes receive the request, they must respond to the client without ensuring whether the value of the response is correct.
Partition tolerance

Fault tolerance of a zone means that the system can still provide services when the network between nodes is faulty.

Knowing the CAP theorem, for developers, when we build services, we need to make trade-offs based on business characteristics, which points the current system can choose and which points should be protected.

3. Data consistency model

Strong consistency

When the update operation is complete, any subsequent accesses by multiple processes or threads will return the latest updated value. This is the most user-friendly, where the user is guaranteed to read what they wrote the next time. According to CAP theory, this implementation requires a sacrifice of availability.
Weak consistency model

There is no guarantee that continuation or thread access will return the latest updated value. We call this “inconsistency window” the time it takes for a user to read that an operation has updated system-specific data. After data is written successfully, the system does not promise that the latest value can be read immediately, nor does it promise how long it will be read. At the expense of consistency in CAP theory.
Final consistency model

Is a special case of weak consistency. Under this consistency, the system ensures that the user can finally read the update of a certain operation to the system specific data (the read operation is preceded by no other update operation of this data). On the premise that no failure occurs, the time of inconsistent window is mainly affected by communication delay, system load and the number of duplicate copies. DNS is a typical ultimate consistency system.

In order to solve this distributed consistency problem, many typical protocols and algorithms have been summarized in the process of tradeoff between performance and data consistency. Among them, Two Phase Commitment Protocol, Three Phase Commitment Protocol and Paxos algorithm are well known. The first two algorithms are described below.

Three, two-stage submission agreement

define

Two-phase commit is also known as a Protocol. In a distributed system, although each node can know the success or failure of its own operation, it cannot know the success or failure of other nodes’ operation. When a transaction across multiple nodes, in order to keep the ACID characteristic of transaction, need to introduce a unity as coordinator component to control all the nodes (referred to as participants) operating results and eventually indicating whether the node should submit the operating results are real (for example, the updated data to disk, etc.). Therefore, the algorithm idea of two-stage submission can be summarized as follows: participants will inform the coordinator of the success or failure of the operation, and then the coordinator will decide whether to submit the operation or stop the operation according to the feedback information of all participants.

Agreement participant

In the two-phase commit protocol, the system generally contains two types of machines (or nodes) : one is the coordinator, usually only one ina system; The other category is participants, Cohorts or workers, which generally contains multiple and can be understood as the number of data copies in the data storage system. In the protocol, it is assumed that write-ahead logs are recorded on each node and stored persistently. Even if a node fails, the logs will not be lost. It also assumes that the nodes will not fail permanently and that any two nodes can communicate with each other.

Two-stage execution

1. Preparation

The transaction coordinator (transaction manager) sends a Prepare message to each participant (resource manager), and each participant either returns a failure (such as permission validation failure) or executes the transaction locally, writing local redo and undo logs but not committing them, reaching a state of “all is ready”.

2. Submission phase

In this stage, the coordinator makes a decision on the results of the first stage participants’ votes: submit or cancel
The coordinator will notify all participants to commit the transaction if and only if all participants agree to commit the transaction, otherwise the coordinator will notify all participants to cancel the transaction. After receiving the message from the coordinator, the actor performs the action (commit or rollback) in response.

Successful submission example:

When all participants agree:1The coordinator node issues a "commit" request to all the participant nodes.2) The participant node formally completes the operation and frees the resources occupied during the entire transaction.3The actor node sends a done message to the coordinator node.4The coordinator node completes the transaction after receiving a "complete" message from all the participant nodes.Copy the code

Failed commit instance:

When a participant disagrees:1The coordinator node issues a "rollback" request to all the participant nodes.2The participant node uses the previously written Undo information to perform a rollback and release the resources occupied during the entire transaction.3The participant node sends a rollback complete message to the coordinator node.4The coordinator node cancels the transaction after receiving a "rollback complete" message from all the participant nodes.Copy the code

Disadvantages of two-phase commit

1. Synchronization blocking. During execution, all participating nodes are transaction blocking. When a participant occupies a common resource, other third party nodes have to block access to the common resource.

2. Single point of failure. Because of the importance of the coordinator, if the coordinator fails. The participants will keep blocking. Especially in phase 2, when the coordinator fails, all participants are still locked in the transaction resources and cannot continue to complete the transaction. (If the coordinator is down, you can re-elect a coordinator, but you can’t solve the problem of participants being blocked because the coordinator is down)

3. Inconsistent data. In phase 2 of the two-phase commit, after the coordinator sends a COMMIT request to the participant, a local network exception occurs or the coordinator fails during the commit request, which results in only a subset of the participant receiving the commit request. These participants perform the COMMIT operation after receiving the COMMIT request. However, other parts of the machine that do not receive the COMMIT request cannot perform the transaction commit. Then the whole distributed system appears the data consistency phenomenon.

4. Unresolvable problem in phase 2: When the coordinator sends a COMMIT message, it crashes, and the only participant who received the message also crashes. So even if the coordinator creates a new coordinator by election agreement, the status of the transaction is uncertain, and no one knows whether the transaction has been committed.

Four, three stage submission agreement

Three-phase Commit, also known as the Three-Phase Commit Protocol, is an improved version of 2PC. Unlike the two-phase commit, the three-phase commit has two changes:

Introduce timeouts. Introduce timeouts for both the coordinator and the participant.
Insert a preparation phase between phases 1 and 2. The state of each participating node is consistent before the final submission stage.
- This results in the preparation phase being split in two, and a three-phase commitCanCommit,PreCommit,DoCommitThree phases.

CanCommit phase

The coordinator sends a COMMIT request to the participant, who returns a Yes response if he can commit, or a No response otherwise.

1. The transaction asks the coordinator to send a CanCommit request to the participant. Asks if a transaction commit operation can be performed. It then waits for the participant’s response.

2. Response Feedback Once a participant receives a CanCommit request, it normally returns a Yes response and enters the preparatory state if it thinks it can successfully execute the transaction. Otherwise, feedback No.

PreCommit phase

The coordinator determines whether a transaction can be PreCommit based on the response of the participants. Depending on the response, there are two possibilities.

If the coordinator receives a Yes response from all participants, the transaction is pre-executed.

1. Send a PreCommit request. The coordinator sends a PreCommit request to the participant and enters the Prepared phase.

2. After receiving a PreCommit request, a transaction is performed and undo and redo information is recorded in the transaction log.

3. Response Feedback If the participant successfully executes the transaction, the participant returns an ACK response and waits for the final instruction.

If either participant sends a No response to the coordinator, or if the coordinator does not receive a response from the participant after a timeout, the transaction is interrupted.

1. Send interrupt request The coordinator sends abort requests to all participants.

2. Interrupt a transaction participant performs an interrupt of a transaction after receiving an ABORT request from the coordinator (or after a timeout, but still no request from the coordinator).

DoCommit phase

The actual transaction commit at this stage can also be divided into the following two scenarios.

commit

1. Send submit request The coordinator receives the ACK response sent by the participant, and it will enter the commit state from the pre-commit state. DoCommit requests are sent to all participants.

2. After receiving the doCommit request, the transaction commit participant performs the formal transaction commit. All transaction resources are released after the transaction commits.

3. Response Feedback After the transaction is committed, send an Ack response to the coordinator.

4. Completion of the transaction The coordinator completes the transaction after receiving ack responses from all participants.

The interrupt transaction coordinator does not receive the ACK response sent by the participant (either the recipient sent an ACK response instead, or the response timed out), then the interrupt transaction is executed.

1. Send interrupt request The coordinator sends abort requests to all participants

2. Transaction rollback After receiving abort request, the participant uses the undo information recorded in phase 2 to roll back the transaction and release all transaction resources upon completion of the rollback.

3. Feedback Result After the transaction rollback, the participant sends an ACK message to the coordinator

4. Interruption The transaction coordinator performs the interruption of the transaction after receiving the ACK message from the participant.

During the doCommit phase, if participants cannot receive doCommit or Rebort requests from the coordinator in time, they will continue to commit the transaction after waiting for timeout. (Actually, this is based on probability. When entering phase 3, the participant has already received a PreCommit request in phase 2, so the coordinator can only make a PreCommit request if he has received a Yes CanCommit response from all participants before phase 2 begins. (Once a participant receives a PreCommit, he knows that everyone has agreed to change it.) So, in a nutshell, when it comes to phase 3, even though the participant does not receive a COMMIT or abort response due to network timeout, he has reason to believe that the commit has a high chance of success.)

Five, 2PC and 3PC difference

Compared to 2PC, 3PC mainly solves single points of failure and reduces blocking, because an actor defaults to commit once he fails to receive information from the coordinator in a timely manner. Transaction resources are not held and blocked all the time. However, this mechanism can also cause data consistency problems because, due to network reasons, the abort response sent by the coordinator is not received in time by the participant, and the participant performs the COMMIT operation after waiting for a timeout. This results in data inconsistencies with other participants who receive abort and perform rollback.

Six, summarized

Knowing 2PC and 3PC, we can see that neither two-phase commit nor three-phase commit can completely solve the distributed consistency problem. It has been said that there is only one consistency algorithm in the world, that is Paxos, and all other consistency algorithms are incomplete versions of Paxos. The admittedly difficult but effective Paxos algorithm will be introduced in a later ZooKeeper column module.

References:

About distributed transactions, two-phase commit protocol, three-stage commit protocol