Spring Cloud Alibaba Series 3: Deep Understanding of Distributed Transactions

As one of the hottest distributed transaction solutions, Seata has gradually received more and more developers’ attention. Seata has been used by many enterprises as a distributed transaction solution in production. This article will look at how Seata is used from a practical point of view.

Seata. IO /zh-cn/docs/…

1. Introduction to distributed transactions

Seata is an open source distributed transaction solution dedicated to providing high performance and easy to use distributed transaction services under the microservices architecture (official website description). We can extract keywords distributed transaction, high performance, easy to use and other characteristics.

1.1 What are Distributed transactions

In the traditional monomer application architecture, each business module shares the same data source, and the transaction of the same data source can be implemented through the transaction mechanism of the database, just as the transaction is implemented locally, so it is also called local transaction.

As services grow, function modules are divided into multiple vertical applications, and the original single application architecture is upgraded to vertical application architecture. In this case, different applications use different data sources. So there are calls between services, each service corresponds to a branch transaction, each transaction is isolated from each other, what about and let these multiple transactions be processed as a whole transaction? Therefore, the concept and problem of distributed transaction are introduced to ensure the consistency of data in distributed system.

1.2 Pain points of distributed transactions

Distributed transactions consist of multiple local transactions. Traditional local transactions support ACID properties:

Atomicity: Refers to the fact that the transaction as a whole either all succeeds or all fails.
Consistency: the Consistency of database data is mainly in the service. For example, in the money transfer scenario, if account A has A balance of 1000 and account B has A balance of 200, and account A transfers 500 to account B, the Consistency data is account A’s balance of 500 and account B’s balance of 700. The balance of account A and account B cannot be 500 and 200 respectively.
Isolation: Internal data between objects are isolated and do not affect each other
Durability: Indicates that operations on things are persisted and that subsequent operations do not change the outcome of the thing

You can see that atomicity, isolation, and persistence are all for consistency. So in a distributed transaction scenario, we ultimately consider how to ensure consistency. The consistency of distributed transactions can be divided into:

Strong consistency: After data is updated successfully, data on all replicas is consistent
Soft consistency: After data is updated successfully, data cannot be read by all replicas. Data consistency is achieved after a certain period of time
Final consistency: Soft consistency. After data is updated successfully, data is consistent after a period of time.

It can be seen that strong consistency is a very tough operation, and each sub-operation must be synchronized to continue the subsequent operation, so the performance is slow. The final consistency is not forcibly synchronized. The final consistency is achieved after a period of time. Therefore, the final consistency is applicable to scenarios that do not require high consistency.

2. Common distributed solutions

The solution of distributed transaction mainly presents different solutions on how to achieve consistency. Therefore, some solutions achieve strong consistency and some solutions achieve ultimate consistency, and different solutions can be applied to different business scenarios.

2.1 2PC- Two-phase commit

2PC two-phase commit is implemented based on X/A protocol, which divides the whole transaction into two stages: preparation stage, commit stage, and defines transaction coordinator and transaction participant in terms of role. It is a highly consistent solution.

2.1.1 role

Transaction coordinator: The transaction coordinator is in an important position because it collects the transaction state of the various participants and initiates the decision whether to commit or rollback
Participants of a transaction: Participants of a transaction are branch transactions that commit and roll back locally according to the resolution of the transaction coordinator

2.1.2 steps

Preparation phase: The coordinator of the transaction sends a preparation message to all participants, who perform the transaction locally but do not commit, thus locking up the required resources. All participants report success or failure of preparation to the transaction coordinator.
Commit phase: The transaction partner makes a decision based on the preparation results reported by the participants of the transaction. If there is feedback that the preparation of one participant failed, the decision is rolled back. If all the participants succeeded, the decision is submitted. Therefore, in the commit phase, the coordinator of the transaction sends commit and rollback messages to the participant, who commits or rolls back the local transaction according to the resolution.

In the first phase, the transaction coordinator synchronously blocks and waits for feedback from all participants. If one participant is down due to network or downtime, the coordinator timeout mechanism is implemented to send resolution rollback to the other participants
In the second phase, the transaction participant receives the commit or fails to roll back the execution. It can be used as a fail-retry method, and human intervention is required after a certain number of times.

2.1.3 advantages

Final consistency: Two-phase commit requires that all nodes are ready before the commit transaction is executed, thus ensuring final consistency of distributed transactions.
Coordinator: Two-phase commits introduce the role of coordinator, so that participants do not need to be aware of each other’s state, and state unity is maintained by the coordinator

2.1.4 shortcomings

Single point of failure: Due to the importance of the coordinator, data inconsistencies can occur when the coordinator goes down. You can deploy the coordinator in cluster deployment or in active/standby switchover mode.
Transaction blocking: Phase 1 waits for all participants to be ready before phase 2. Therefore, all participants in phase 1 need to lock the resources. Third party transactions cannot access the locked resources, resulting in low concurrency.
Data split: In the second stage, the coordinator sends the submit and rollback resolution to the participants, and some participants do not receive the resolution due to network jitter or partition failure, so the data will be inconsistent.
Transaction state is uncertain: If both the coordinator and the participant are down, the state of the transaction is uncertain.

2.1.5 Application Scenarios

Seata’s AT model is improved based on two-phase commit

2.2 3PC- Three-phase commit

Compared with two-stage submission, a pre-submission phase is introduced, which mainly introduces the following two improvements:

Timeout mechanisms are introduced by both the coordinator and the participant of the transaction
A pre-commit phase is inserted between the prepare and commit phases to ensure that all participants are in a consistent state before committing

2.2.1 steps

Preparation phase (canCommit) : the coordinator sends a commit request to all participants, returning yes if the participant canCommit, and no otherwise
PreCommit phase: After the coordinator collects feedback from participants, if one of the participants returns no or times out, the transaction interrupts; otherwise, the next commit phase proceeds
Commit phase (doCommit) : The coordinator sends a COMMIT request to all participants. After receiving the request, the participants perform the commit of the transaction in the background and feedback the execution result to the coordinator. The coordinator completes the transaction according to the execution result.

2.2.2 advantages

The participant introduces a timeout mechanism: if the participant does not receive a resolution from the coordinator for a long time, it does not block forever and commits the transaction directly
Precommit ensures that participants are in a consistent state: in phase 1 of 2PC, the transaction coordinator sends a COMMIT request at the outset, not knowing if any of the participants are ready, causing some participants to lock synchronized resources in advance. Therefore, the can Commit phase introduced by 3PC ensures that all participants are in a consistent state before making commit requests and executing transactions in the background.

2.2.3 shortcomings

The problem of data inconsistencies remains: if the participant does not receive a request from the coordinator for a long time, the default is to commit the transaction, but what the coordinator really wants to send is a rollback resolution? That can lead to data inconsistency problems
Process complexity: The introduction of a new operation step makes the overall process more complex and reduces performance.
Strong theory: although the timeout mechanism and pre-submission stage are introduced compared with 2PC, the problem of data consistency cannot be completely solved, so the theoretical stage is still dealt with

2.3 Transaction Messages

Distributed transactions can be implemented through some middleware, and RocketMQ supports transactional messages. RocketMQ has the concept of half-messages, meaning that a message sent by a producer to MQ is not immediately consumed by the consumer, but requires a second confirmation to be consumed by the consumer. Based on this feature, RocketMQ’s half-messages can be used to achieve the ultimate consistency of distributed transactions.

2.3.1 concept

Semi-message: The message producer sends a message to the MQ Server that is not consumed by the consumer.
Transaction rollback: If the transaction initiator does not send a response to the MQ Server for a long time after the local transaction is executed, the MQ Server proactively checks the execution status of the transaction initiator.

2.3.2 steps

Using the order service and the inventory service as examples, illustrate the steps to implement Rocket MQ transaction messages

First, the order service sends a half-message to the MQ Server, which is not consumed by the inventory service
Upon receiving the half-message, the MQ Server sends notification of the success of the half-message to the order service
If the half-message is sent successfully, the order service returns to the local transaction and decides whether to commit or rollback based on the execution result
The order service completes the local transaction and sends a commit or rollback request to the MQ Server
If the order service takes a long time to execute and the MQ Server does not receive a response from the order service for a long time, it will query the execution status of the transaction through the callback interface. Therefore, a log table of transaction execution is required to record the status of the transaction
MQ Server handles the status of the half-message based on the commit/rollback response and, if committed, moves the half-message to the target queue for consumption by the inventory service. If it is a rollback operation, the half-message is deleted
After the inventory service consumes the message to submit the transaction, it will execute the transaction locally. If the transaction fails to execute, it needs to retry continuously. After a period of time, it needs to record the transaction in the error log table, requiring manual intervention.

2.3.3 advantages

Final consistency: The transaction message guarantees final consistency, so the order service successfully orders and the inventory service retries to deduct the inventory, which ultimately guarantees that the inventory is deducted.
Asynchronous check back: If a transaction is not sent to the MQ Server for a long time, the system uses the check back interface to query the execution status of the transaction initiator.

2.3.4 shortcomings

High business coupling: RocketMQ transaction messages require a lot of business code development, a business lookup interface, and a complex execution process

2.4 Local Message table – Maximum effort notification type

At the heart of a local message table is the asynchronous execution of distributed transaction steps through logs, which can be stored in a local file, typically a database. The local message table needs to create an additional table to record the execution status of the transaction, and then retry the failed records through a scheduled task or human intervention.

Against 2.4.1 steps

Add a message log table to record the result of the transaction. The originator of the transaction needs to record a log in the message table in the same transaction as the business logic.
The transaction originator sends an order message to the MQ middleware, which the downstream service consumes, implements the business operation, completes the local transaction, and retries if it fails. And send the results to MQ.
Transaction initiation consumes the execution result of the downstream transaction and modifies the state of the local message table to completed.
Background scheduled tasks constantly scan the log table for unfinished transaction logs and retry sending messages to MQ, so downstream services need to keep the interface idempotent

2.4.2 advantages

Final consistency: Data consistency cannot be ensured in real time

2.4.3 shortcomings

Code is intrusive: Additional message tables need to be added and coupled with the business code
Idempotency of retry: The retry interface needs to be idempotency

2.5 TCC- Compensation type

TCC compensation is a compensation solution that consists of three phases: try, COMMIT, and Cancel. Every business operation needs to implement a commit interface and a rollback interface. Similar to 2PC phase commit, the transaction will be committed only when the participants reach a consistent state.

2.5.1 phase

Try phase: Resources are tested and reserved, but transactions are not committed, such as balance verification and freezing of funds
In the Commit phase, services are confirmed. Generally, if the try phase succeeds, confirm phase succeeds
Rollback phase (Cancel) : If the Confirm phase fails, you need to invoke the Cancel phase to perform rollback compensation and release reserved resources

2.5.2 advantages

Strong consistency: Ensures strong consistency of distributed data

2.5.2 shortcomings

High degree of business coupling: each branch transaction must realize three interfaces, which is highly intrusive to the business
Idempotent: The confirm and Cancel phases may fail and retry, so the interface must be idempotent

Distributed transactions provided by Seata

Seata is an open source distributed transaction framework, which provides different transaction modes of AT, TCC, Saga and XA to provide users with a one-stop distributed transaction solution.

3.1 the AT mode

The AT pattern is further optimized based on two-phase commit:

Phase one: Business data and rollback logs are committed in the same transaction, freeing local locks and resources
Phase 2: The commit operation is asynchronous and the rollback log is deleted asynchronously, which is very fast. Rollback is a reverse compensation operation that restores the log to the state before the transaction

3.2 TCC mode

TCC mode integrates custom branch transactions into the management of global transactions, and is also improved based on two-phase commit. Each branch transaction must implement try, Confirm, and Cancel interfaces. See the above introduction for details.

3.3 Saga mode

The Saga pattern is a long transaction solution provided by Seata, where each participant in the business process commits a local transaction, and if one participant fails to commit, the previous successful participant is rolled back. The forward and compensation logic of the first phase is implemented by the business. This mode applies to scenarios with long service processes and much service logic.

4. To summarize

Distributed transactions are designed to solve the data consistency problem of multiple branch transactions.
Consistency includes strong consistency, weak consistency, and final consistency. Most service scenarios meet the final consistency
The distributed solution includes five modes: two-phase commit, three-phase commit, transaction message, local message table and TCC mode. Among them, transaction message, local message table and TCC mode guarantee the final consistency, while two-phase commit and three-phase commit guarantee the strong consistency
Seata is an open source distributed solution that supports AT, TCC, Saga, and XA schemas