Original statement: this article is the author’s original, declined individual, media, public number or website without authorization to reprint, violators will be investigated for legal responsibility.



Application scenarios based on XA



The biggest architectural difference between the XA protocol and the TCC model is that XA acts directly on the resource layer, while the latter acts on the service layer.

The resource layer is more pervasive and almost non-intrusive to the business, but transaction ACID characteristics need to be strictly followed in order to adapt to various business scenarios; The service layer is closer to the business and can be optimized for different businesses to pursue higher extreme performance.



Of course, this is not to say that the XA protocol can only work in multi-resource scenarios within a single service. Multi-resource scenarios across services are also possible, but also require additional transaction delivery mechanisms.

As described in overview of Distributed Transactions 1, the XA protocol ensures global isolation through the local transaction isolation of each RM (Resource Manager), and requires serialization of isolation levels to ensure distributed transaction consistency. However, the serialization isolation level has certain performance issues, as shown below:

At the serialization isolation level, read locks are added to all Select snapshot read operations that are not locked. As a result, the lock holding time increases and the concurrent performance deteriorates. When lock-free global consistent reads are implemented, such as distributed MVCC, lock holding time can be significantly reduced and concurrency performance can be greatly improved.



However, no matter how to optimize the implementation, the highest performance of hot data concurrency of distributed transactions is similar to that of single local transactions. Therefore, both distributed transactions based on XA protocol and standalone local transactions have the limitation of hotspot data concurrency performance.



So what is the biggest effect of XA protocol? Its most important function is to ensure the transaction attribute of multi-resource access when database resources are extended horizontally.



When a single RM machine reaches the resource performance bottleneck and cannot meet service growth requirements, you need to expand RM resources horizontally to form an RM cluster. Scaling resources horizontally to improve the concurrency performance of non-hotspot data is crucial for large-scale Internet products.

For example, assuming that the non-hot data concurrency performance of a single RM is 100 TPS, then five RMS are 500 TPS. Even if a distributed transaction involves two RMS on average, it still has 250 TPS, improving the non-hot data concurrency capability by 2.5 times.



To sum up, distributed transactions based on XA protocol cannot improve hotspot concurrency performance. The significance of distributed transactions is to ensure strict access to multiple resources while horizontally expanding resources to improve non-hotspot data concurrency performance.



As for the concurrent performance of hotspot data, for common applications, the concurrent performance can basically meet service requirements after certain performance optimization at the SQL level. If the optimization fails to meet the performance limit, you need to upgrade to the service level and optimize the service logic or service architecture based on service characteristics.



Another benefit of implementing distributed transactions directly at the resource layer is its universality, which shields upper-level business from lower-level implementation details. This is especially useful in the era of cloud services, cloud services with a large number of small and medium-sized enterprises, even is a people developer, different business demands, universal, standard of distributed transaction products are necessary, can let the developer from the underlying technology details out, more focus on the business logic implementation, to get more efficient and rapid business development.



Application scenario based on TCC model

The TCC distributed transaction model works directly on the service layer. It is not coupled with a specific service framework, has nothing to do with the underlying RPC protocol, and has nothing to do with the underlying storage media. It can flexibly choose the locking granularity of service resources, reduce the holding time of resource locks, and has good scalability. It can be said that it is designed for SOA services deployed independently.



TCC model advantages



As for THE TCC distributed transaction model, the author believes that it has two aspects of significance in the application of business scenarios.



Distributed transactions across services



This part is similar to XA in that the unbunching of services can also be thought of as a horizontal expansion of resources, but in a different direction.



Horizontal expansion can go in two directions:

1. Function expansion. Grouping data by function and distributing different groups of function across different databases is essentially servitization under an SOA architecture.

2. Data sharding, which splits data into multiple databases within a functional group, adding a new dimension to horizontal expansion.



The following diagram briefly illustrates the horizontal data scaling strategy:

The two methods of horizontal extension can be used simultaneously: three different functional groups of Users, Products and transaction information can be stored in different databases. In addition, each function group can be split into multiple databases according to its business volume, and each function group can be independently extended.

Therefore, one of the functions of TCC is to ensure the transactional nature of multiple resource access as resources are scaled horizontally by functionality.



2. Two-stage split



Another function of TCC is to split the two phases into two independent phases, which are associated by means of resource business locking. The advantage of resource business locking is that it does not block the continued use of the same resource by other transactions in the first phase, nor does it affect the correct execution of the second phase of this transaction.





Concurrent transactions in the XA model

Concurrent transactions in the TCC model



You can see that the TCC model further reduces the holding time of resource locks. At the same time, in theory, the second phase of a transaction can be executed whenever the business allows, since the resource is locked by the business and no other transaction can use the locked resource.



What’s in it for business? Take the guaranteed transaction scenario of Alipay as an example. In the simplified case, only two services, transaction service and accounting service, need to be involved. Transaction as the main business service, accounting as the secondary business service, provide Try, Commit, Cancel interface:

1. Interface Try deducts available funds from the user and transfers them to pre-frozen funds. Pre-frozen funds is the business locking scheme. Each transaction can only use the pre-frozen funds of the transaction in the second phase. After the execution of the first phase, other concurrent transactions can also continue to process the user’s available funds.

2. Commit interface deducts the pre-frozen funds and increases the available funds of the intermediate account (the money cannot be immediately transferred to the merchant for guaranteed transactions and an intermediate account is required for temporary deposit).



Assume that there is only one intermediate account. Each time you invoke the Commit interface of the payment service, the intermediate account will be locked. The intermediate account has hot performance problems.



However, in the secured transaction scenario, funds need not be transferred from the intermediary account to the merchant until seven days later, and the intermediary account does not need to be displayed to the public. Therefore, after the execution of the first stage of payment service, it can be considered that the payment link of this transaction has been completed, and the result of successful payment is returned to users and merchants. It is not necessary to immediately execute the Commit interface of the second stage of payment service, and then slowly digest and execute asynchronously when the low front stage is reached.

May some readers think guarantee trading is special, in fact, direct pay deal (the money directly to merchant account trading patterns, Commit interface to withhold funds frozen, not shifted to intermediate accounting, but directly transferred to the merchant account) can also be such use, as long as informed merchants, peak trading money not to account, real time But to ensure the completion of settlement within a certain period of time, merchants should also be understandable.



This is the two-phase asynchronous function of the TCC distributed transaction model. After the successful execution of the first phase of the business service, the master business service can be committed to completion, and then the framework asynchronously executes the second phase of each slave business service.



Universal TCC solution



The generic TCC solution is the most typical implementation of the TCC distributed transaction model, where all slave business services need to participate in the decision of the master business service.

Applicable scenario



Since the slave business service is called synchronously, its results will affect the decision of the main business service, so the universal TCC distributed transaction solution is suitable for the business with fixed execution time and short duration, such as the three core services of Internet financial enterprises: transaction, payment and accounting:

When a user initiates a transaction, he/she first accesses the transaction service and creates a transaction order. The transaction service then invokes the payment service to create the payment order for the transaction, perform the collection action, and finally the payment service invokes the accounting service to record the account flow and bookkeeping.



In order to keep the three services together to complete a deal, or success at the same time, or fail at the same time, you can use the universal TCC solution, put the three services in a distributed transaction, transaction as the main business service, payment as from the business service, accounting for the service of nested from business services, ensure transaction atomicity by TCC model.

The Try interface of the payment service creates the payment order, starts the nested distributed transaction, and calls the Try interface of the accounting service. The accounting service freezes buyer funds in the Try interface. After the first-stage invocation is completed, the transaction is completed and the local transaction is submitted. The TCC framework completes the invocation of the distributed transaction from the second stage of business service.



In the second stage of the payment service, the Confirm interface of the accounting service is first called to unfreeze the buyer’s funds. Increase funds available to sellers. After the call is successful, the payment service changes the payment order to complete state and completes the payment.



When both the payment and accounting services phases have been invoked, the entire distributed transaction ends.



Asynchronous ensure TCC solution



In an asynchronous assured TCC solution, the direct slave business service is the reliable message service, while the true slave business service is decoupled through the message service and executed asynchronously as the consuming side of the message service.

The reliable message service provides Try, Confirm, and Cancel interfaces. Try interface pre-send, only responsible for persistent storage message data; Confirm Indicates that the interface confirms sending and then sends the message. Cancel The interface cancels sending and deletes message data.



The message data of message service is stored and scaled independently to reduce the coupling between slave business service and message system, and realize the final consistency of distributed transactions on the premise of reliable message service.



This solution increases the maintenance cost of the message service, but because the message service implements the TCC interface instead of the slave business service, the slave business service does not require any modification and the access cost is very low.



Applicable scenario



Because consuming messages from business services is an asynchronous process, execution time is uncertain, which can lead to an increase in inconsistent time Windows. Therefore, the asynchronous assured TCC distributed transaction solution is only applicable to some passive businesses that are less sensitive to the final consistency time (the processing result of the business service does not affect the decision of the main business service, and only passively receives the decision result of the main business service). Such as membership registration services and mail delivery services:

When a user successfully registers as a member, an email is sent to inform the user that the registration is successful and prompts the user to activate the member. But two caveats:

1. If the user successfully registers, send an email to the user.

2. If the user fails to register, do not send emails to the user.



Therefore, this also requires atomicity of the membership service and mail service, either implemented or not implemented. In contrast, the mail service is a passive business, which does not affect whether users can register successfully. It only needs to send emails to users after they register successfully, and the mail service does not need to participate in the decision-making of member services.



For such a business scenario, an asynchronous assured TCC distributed transaction solution can be used, as follows:

The reliable message service decouples member service from mail service, and member service and message service constitute TCC transaction model to ensure atomicity of transaction. Then, the reliability of the message service ensures that the message can be consumed by the mail service, so that the member and the mail service are in the same distributed transaction. At the same time, the mail service will not affect the implementation process of the member service, only passively receive the request to send mail after the successful implementation of the member service.



Compensating TCC solution



The compensatory TCC solution has a similar structure to the generic TCC solution in that the slave business services also need to participate in the decision making of the master business services’ activities. However, the difference is that the slave business service of the former only needs to provide Do and Compensate interfaces, while the latter needs to provide three interfaces.

Do interface directly execute the real complete business logic, complete business processing, business execution results visible outside; An Compensate operation is used to Compensate or partially offset the business results of a positive business operation. An Compensate operation is idempotent.



Compared with the general solution, the slave service of the compensation solution does not need to modify the original service logic, but only needs to add an additional compensation rollback logic, resulting in less service modification. Be aware, however, that the business executes the entire business logic in one phase without effective transaction isolation, and that when rollback is required, compensation may fail and additional exception handling mechanisms, such as human intervention, are required.



Applicable scenario



Due to the failure of rollback compensation, the compensated TCC distributed transaction solution is only applicable to some businesses that have less concurrent conflicts or need to interact with external businesses. These external businesses are not passive businesses, and their execution results will affect the decision of the main business service, such as the airline ticket booking service of the airline ticket agent:

The service offers a multi-ticket booking service that allows you to book multiple flights at the same time, such as from Beijing to St. Petersburg, which requires the first flight from Beijing to Moscow, and the second flight from Moscow to St. Petersburg.



When you book a flight, you want to book both flights at the same time, and it doesn’t make sense for you to book just one. Therefore, there is also an atomic requirement for such a business service that if a reservation fails on one flight, the other flight needs to be able to cancel the reservation.



However, it is extremely difficult to push airlines to transform because airlines are external businesses compared to ticket agents and only provide booking and cancellation interfaces. Therefore, for such business services, a compensating TCC distributed transaction solution can be used, as follows:

On the basis of the original logic, the gateway service adds the Compensate interface, which is responsible for invoking the corresponding airline’s reservation cancellation interface.



When the user initiates the air ticket booking request, the air ticket service first invokes the booking interface of each airline through the gateway Do interface. If all flights are booked successfully, the whole distributed transaction is directly executed successfully. Once a flight reservation fails, the distributed transaction is rolled back, and the TCC transaction framework invokes the Compensate interface of each gateway, which then invokes the cancellation interface of the corresponding airline. In this way, the atomicity of multi-flight booking service can also be guaranteed.



Total knot



For the current Internet applications, resource horizontal extension provides more flexibility and is a relatively easy to implement outward extension scheme, but it also significantly increases the complexity and introduces some new challenges, such as data consistency between resources.



Horizontal data expansion can be extended by data slice or function. The XA and TCC models are similar in this respect, both of which can extend resources horizontally while ensuring the transaction attribute of multiple resource access, except that the former applies to data sharding and the latter applies to function extension.



Another significance of XA model lies in its universality. Regardless of performance problems, it can be applied to almost all business models, which is very useful for some basic technical products, such as distributed database, distributed transaction framework of cloud services, etc.



In addition to the function of distributed transaction across services, TCC model also has the function of two phases, which allows the asynchronous execution of the second phase through business resource locking, and the idea of asynchronization is one of the powerful tools to solve the hot data concurrency performance problem.



Based on specific business scenarios and examples, this paper compares and analyzes the capabilities of distributed transaction solutions in performance, hotspot conflict, access complexity and application scenarios, hoping to help readers have a deeper understanding of distributed transaction.



Some businesses tolerate short-term inconsistencies, some operations are idempotent, and no distributed transaction solution has its advantages and disadvantages. No silver bullet can fit all. Therefore, the solution required by the service needs to be comprehensively analyzed based on its own service requirements, service characteristics, technical architecture, and the characteristics of each solution to find the most suitable solution.



In the next article, ant Financial will introduce a set of distributed transaction products and a variety of solutions after years of development to serve a large number of different internal and external businesses in distributed transaction.



Financial Class Distributed Architecture (Antfin_SOFA)