Because each service of a distributed system may be distributed on different nodes, if each node does not directly communicate with each other to obtain the status of other nodes, then each node cannot know the task processing results of other nodes.

If a transaction is initiated in a distributed system involving multiple different nodes, then to ensure the ACID properties of the transaction, it is necessary to introduce a coordinator to uniformly schedule the multiple nodes involved in the transaction, and the nodes to be scheduled are called transaction participants. From this derived 2PC and 3PC protocols, this paper will introduce the working mechanism of 2PC and 3PC in detail.

2PC(Two-phase Commit, two-phase Commit)

As the name suggests, there are two phases: Prepare and Commit

Prepare: Submits the transaction request

The basic process is as follows:

  1. The query coordinator sends a transaction request to all participants, asks if the transaction operation can be performed, and then waits for the response from each participant.

  2. Execution Upon receiving a coordinator transaction request, each participant performs transaction actions (such as updating records in a relational database table) and records Undo and Redo information in the transaction log.

  3. Response If the participant successfully executed the transaction and wrote Undo and Redo information, a YES response is returned to the coordinator, otherwise a NO response is returned. Of course, the participant could also go down and not return a response.

Commit: Performs the transaction Commit

There are two cases of performing a transaction commit, a normal commit and a rollback.

Normal committed transaction

The process is as follows:

  1. Commit request Coordinator sends commit requests to all participants.

  2. After receiving the Commit request, the transaction Commit participant performs the Commit and releases all resources occupied during the execution of the transaction.

  3. Feedback Results The participant sends an Ack response to the coordinator after performing the transaction commit.

  4. Completion transaction After receiving Ack responses from all participants, the transaction commits.

Interrupt the transaction

After the Prepare step is executed, if some participants fail to perform a transaction, break down, or the network between them is interrupted, the coordinator cannot receive YES responses from all participants, or a participant returns No response. In this case, the coordinator rolls back the transaction. The process is shown in red (replace the Commit request with the Rollback request in red) :

  1. Rollback Request The coordinator sends a ROLLBACK request to all participants.

  2. After receiving Rollback, participants use the Undo logs in the Prepare phase to Rollback the transaction and release all resources occupied during the transaction execution.

  3. Feedback Result After a transaction rollback, the participant sends an Ack response to the coordinator.

  4. The interrupt transaction completes the interrupt after receiving Ack responses from all participants.

2 PC

  1. When the participant waits for the coordinator’s instruction, he is actually waiting for the response from other participants. During this process, the participant cannot perform other operations, that is, it blocks its operation. If the network between the participant and the coordinator is abnormal and the participant cannot receive the coordinator information all the time, the participant will be blocked all the time.

  2. Single point In 2PC, all requests come from the coordinator, so the status of the coordinator is critical. If the coordinator goes down, then the participant will block and occupy transaction resources all the time.

    If the coordinator is also distributed and provides services in a master selection mode, then if one coordinator fails, another coordinator can be selected to continue the subsequent services, which can solve the single point problem. However, the new coordinator does not know all the status information of the previous transaction (such as how long the prepared response has been waited for, etc.), so the previous transaction cannot be processed successfully.

  3. Commit The Commit request /Rollback request may be lost due to the breakdown of the coordinator or network problems between the coordinator and participants. As a result, some participants do not receive the Commit/Rollback request. Other participants normally receive the Commit/Rollback operation. Participants that do not receive the request continue to block. At this point, the data is no longer consistent between participants.

    After a Commit/Rollback, the participant sends an Ack to the coordinator. However, whether the coordinator receives an Ack from all participants or not, there is no other remedy for the transaction. All the coordinator can do is wait for the timeout and return an “I am not sure if this transaction succeeded” message to the transaction initiator.

  4. Environment depends on reliability coordinator Prepare request, after waiting for a response, but if there are any participants downtime or and coordination between the network interruption, can lead to the coordinator cannot receive all participants’ response, so in the 2 PC, coordinator will wait for a certain time, then after a timeout, interrupt trigger affairs, in the process, The coordinator and all other participants are motivated by blocking. This mechanism is too harsh for real-world environments where network problems are common.

3PC(Three-phase Commit, three-phase Commit)

3PC is designed to solve some of the shortcomings of 2PC based on 2PC. 3PC is divided into three stages: CanCommit, PreCommit and doCommit.

CanCommit

The process is as follows:

  1. The transaction query coordinator sends a transaction canCommit request to all participants that contains the transaction content, asks if the transaction commit operation can be performed, and starts waiting for the response.

  2. After receiving the canCommit request, the participant analyzes the transaction content to determine whether it can execute the transaction. If it can, the participant returns Yes and enters the preparatory state; otherwise, the participant returns No.

    Note: No transaction was executed during this process (compared to the Prepare phase for 2PC, the participant executed a transaction).

PreCommit

The flow chart is as follows:

The PreCommit phase determines the next action based on the CanCommit response returned by each participant. If Yes responses from all participants are received, the transaction pre-commit is performed; otherwise (at least one No response is received or No response from all participants is not received within a certain period of time, as shown in red in the first picture of 3PC), the transaction is interrupted.

Transaction precommit

  1. Send PreCommit Request The coordinator sends the PreCommit request and enters the Prepared phase.

  2. Participants process PreCommit After receiving a PreCommit request, a participant performs a transaction and records Undo and Redo information in the transaction log.

  3. If the participant successfully executes the transaction and writes Undo and Redo information, then the participant reports back an Ack to the coordinator and waits for the next instruction.

Interruption of the transaction

In the figure above, Abort in red indicates that the coordinator sends an Abort request instead of a PreCommit request.

  1. Send transaction interrupt request The coordinator sends Abort requests to all participants.

  2. Interrupt transaction participants trigger a transaction interrupt after receiving an Abort request. In addition, if the participant timed out while waiting for the coordinator instruction, it would trigger the transaction interrupt itself. In 2PC, the participant would block waiting for the coordinator instruction, so 3PC solved the blocking caused by this situation.

doCommit

The flow chart is as follows:

The coordinator determines the final action based on the response of the second phase. If the coordinator receives Ack responses from all participants in the PreCommit phase, the execution transaction commit phase is entered, otherwise the execution transaction is interrupted.

Transaction commit

  1. After receiving the Ack response from all participants during the PreCommit phase, the coordinator sends a doCommit request to all participants and enters the commit state.

  2. After receiving the Commit request, the transaction Commit participant performs the Commit and releases all resources occupied during the execution of the transaction.

  3. Feedback Results After the participant completes the transaction commit, it returns an Ack response to the coordinator.

  4. Completion of the transaction The coordinator completes the transaction after receiving Ack responses from all participants.

Interruption of the transaction

  1. Send transaction interrupt request The coordinator sends Abort requests to all participants.

  2. After receiving the Abort request, the transaction rollback participant uses the Undo information recorded in the second phase to roll back the transaction and releases all transaction resources upon completion of the rollback.

    Note: Since none of the participants actually execute the transaction during the first phase (PreCommit phase), the transaction rollback is not required for the second phase (PreCommit phase), and thus the following feedback results are not required.

  3. Feedback Rollback result After performing the transaction rollback, the participant sends an Ack response to the coordinator.

  4. The interrupt transaction coordinator completes the interrupt after receiving Ack responses from all participants.

Improvements and disadvantages of 3PC

To improve the

  1. Reduced congestion

    • After the participant returns the response to the CanCommit request, it waits for the second-stage instruction. If the wait times out, it is automatically abort, reducing blocking.

    • After returning the PreCommit request response, the participant waits for the third stage instruction. If the wait times out, the transaction is automatically committed, which also reduces the blocking.

  2. Solve single point of failure

    • After the participant returns the response to the CanCommit request, it waits for the second-stage instruction. If the coordinator goes down, the wait timeout is automatically abort.

    • After the participant returns the response of the PreCommit request, he/she waits for the third-stage instruction. If the coordinator goes down, he/she will automatically commit the transaction after the timeout.

disadvantages

Data inconsistencies still exist, such as when the phase 3 coordinator issues ABORT requests, and some participants do not receive ABORT, then the commit is automatically performed, causing data inconsistencies.

conclusion

From the above, neither 2PC nor 3PC is a perfect solution to the problem of distributed data consistency. Although ACID properties of transactions are not guaranteed, the two-phase idea is widely used in many practical architectures, such as JTA transactions and data synchronization of some databases.

Here’s a quote from Google Chubby:

“There is only one consensus protocol, and that’s Paxos” — All other approaches are just broken versions of Paxos.

Paxos is an algorithm proposed by Lambert to solve the problem of distributed consistency.