This is the 9th day of my participation in the August More Text Challenge. For details, see:August is more challenging

The paper

Distributed transaction refers to the transaction operation is located on different nodes, the transaction needs to ensure the AICD feature.

For example, in the ordering scenario, if the inventory and the order are not on the same node, a distributed transaction is involved.

The solution

In a distributed system, there are no other solutions to implement distributed transactions.

Phase 1 and 2 commit (2PC)

A two-phase Commit (2PC) introduces a Coordinator to coordinate the behavior of the participants and ultimately decide whether those participants will actually execute the transaction.

1. Running process

1.1 Preparation

The coordinator asks the participant whether the transaction executed successfully, and the participant sends back the result of the transaction execution.

1.2 Submission Phase

If the transaction executes successfully on each participant, the transaction coordinator sends a notification to the participant to commit the transaction; Otherwise, the coordinator sends a notification to the participant to roll back the transaction.

It is important to note that during the prepare phase, the participant performs the transaction, but has not yet committed. Commits or rolls back only after receiving a notification from the coordinator during the commit phase.

2. Existing problems

2.1 Synchronous Blocking All transaction participants are in synchronous blocking state while waiting for the response from other participants and cannot perform other operations.

2.2 Single-point Problem Coordinator plays a very important role in 2PC, and failure will cause great impact. In particular, if a failure occurs in phase 2, all participants will be waiting for the state, unable to complete other operations.

2.3 Data Inconsistency In Phase 2, if the coordinator only sends part of the Commit message and the network is abnormal, only part of the participants receive the Commit message. That is to say, only part of the participants Commit the transaction, resulting in system data inconsistency.

2.4 Too conservative If any node fails, the entire transaction will fail. There is no perfect fault-tolerant mechanism.

2. Compensation Transaction (TCC)

TCC is actually the compensation mechanism adopted. The core idea is that for each operation, a corresponding acknowledgement and compensation (undo) operation should be registered. It is divided into three stages:

  • In the Try phase, service systems are checked and resources are reserved

  • The Confirm phase is mainly to Confirm and submit to the business system. When the Try phase is successfully executed and the Confirm phase starts, the default Confirm phase is error-free. As long as Try succeeds, Confirm must succeed.

  • In the Cancel phase, services that need to be rolled back are cancelled and reserved resources are released.

For example, let’s pretend that Bob wants to transfer money to Smith by saying something like this: We have a local method that we call in turn

  1. First, during the Try phase, the remote interface is called to freeze Smith and Bob’s money.
  2. In the Confirm phase, the transfer operation of the remote call is performed, and the transfer is successfully unfrozen.
  3. If step 2 succeeds, the transfer succeeds, and if step 2 fails, the unfreeze method (Cancel) corresponding to the remote freeze interface is called.

Advantages: Compared to 2PC, the implementation and process is relatively simple, but the data consistency is also less than 2PC

Weakness: the weakness is still relatively obvious, in step 2 and 3 May fail. TCC is a compensation method in the application layer, so programmers need to write a lot of compensation code in the implementation. In some scenarios, some business processes may not be easy to define and handle with TCC.

3. Local message Table (asynchronous assurance)

The local message tables are in the same database as the business data tables, so local transactions can be used to ensure that the transaction characteristics of operations on both tables are met, and message queues are used to ensure ultimate consistency.

  1. The local transaction guarantees that the message will be written to the local message table when one of the distributed transaction operators sends a message to the local message table after writing the business data.
  2. After that, the message in the local message table is forwarded to the message queue such as Kafka. If the forwarding succeeds, the message is removed from the local message table. Otherwise, the forwarding continues.
  3. On the other side of a distributed transaction operation, a message is read from a message queue and the operation in the message is performed.

Advantages: A very classic implementation that avoids distributed transactions and achieves ultimate consistency.

Disadvantages: The message table is coupled to the business system, and without a encapsulated solution, there is a lot of chores to handle.

Iv. MQ transaction messages

Some third-party MQS, such as RocketMQ, support transactional messages in a way similar to the two-phase commit, but some mainstream MQS, such as RabbitMQ and Kafka, do not support transactional messages.

Taking RocketMQ middleware of Ali as an example, its idea is roughly as follows:

The first stage is Prepared. You will get the address of the message. The second phase executes the local transaction, and the third phase accesses the message through the address obtained in the first phase and modifies the state.

That is, there are two requests to the message queue within the business method, a send message and an acknowledgement message. RocketMQ will periodically scan the transaction messages in the message cluster if a Prepared message is found. It will confirm to the message sender, so the producer needs to implement a check interface. RocketMQ will decide whether to roll back or continue sending the confirmation message according to the policy set by the sender. This ensures that the message sending succeeds or fails at the same time as the local transaction.

Advantages: Ultimate consistency is achieved without reliance on local database transactions.

Disadvantages: Difficult to implement, not supported by mainstream MQ, and the code for the transaction message part of RocketMQ is not open source.

conclusion

In this paper, we summarize and compare the advantages and disadvantages of several distributed decomposition solutions. Distributed transaction itself is a technical problem, and there is no perfect solution to deal with all scenarios. The specific choice should be made according to the business scenario. Ali RocketMQ to implement distributed transactions, there are also many distributed transaction coordinators, such as LCN, you can try more.