Understand the design idea of distributed transaction framework through Fescar (Seata) document

Fescar is an open source distributed transaction middleware of Alibaba, which solves the distributed transaction problems in microservice scenarios in an efficient and zero-intrusion way.

What are the distributed transaction issues brought about by microservitization?

First, imagine a traditional Monolithic App that implemented a single business by updating data on the same data source with three modules.

Naturally, data consistency across business processes is guaranteed by local transactions.

As business requirements and architecture change, individual applications are split into microservices: the original three modules are split into three independent services, using independent data sources (Pattern: Database per service). The business process will be completed by the invocation of three services.

At this point, data consistency within each service is still guaranteed by local transactions. How can global data consistency be guaranteed across the business? This is the typical distributed transaction requirement under microservices architecture: we need a distributed transaction solution to ensure the data consistency of business globally.

The history of Fescar

Ali is one of the earliest enterprises to apply distributed (microservitization) transformation in China, so it has long encountered the problem of distributed transactions under microservice architecture.

In 2014, Alibaba middleware team released TXC (Taobao Transaction Constructor) to provide distributed Transaction services for applications within the group.

In 2016, TXC went through product transformation and was launched in Ali Cloud as Global Transaction Service (GTS). TXC became the only distributed Transaction product on cloud in the industry at that time. It began to serve many external customers in the public cloud and private cloud solutions of Ali Cloud.

Since 2019, based on TXC And GTS technology accumulation, Ali middleware team launched the open source project Fescar (Fast & EaSy Commit And Rollback, Fescar) to build this distributed transaction solution together with the community.

TXC/GTS/Fescar is a unique solution to the problem of distributed transactions under microservices architecture.

designed

In the fast-growing Internet age, the ability to quickly trial-and-error is critical to business:

On the one hand, the introduction of microsertization and distributed transaction support on the technical architecture should not impose additional r&d burdens on the business level.
On the other hand, businesses that introduce distributed transaction support should maintain roughly the same level of performance and not be significantly slowed down by transactions.

Based on these two points, the most important considerations at the beginning of our design were:

Non-intrusive to the business: Intrusive here means that applications need to be designed and adapted at the business level because of the technical constraints of distributed transactions. This kind of design and modification often brings high development and maintenance costs to the application. We want to solve the distributed transaction problem at the middleware level without requiring the application to do extra work at the business level.
High performance: The introduction of distributed transaction guarantees inevitably leads to additional overhead and performance degradation. We want to reduce the performance cost of the introduction of distributed transactions to a very low level, so that the application does not suffer from the availability of the business due to the introduction of distributed transactions.

Why not satisfy existing solutions?

Existing distributed transaction solutions can be divided into two categories according to their intrusiveness to the business, namely, non-intrusive and intrusive.

Non-intrusive service solution

Among the existing mainstream distributed transaction solutions, only XA based solutions are non-invasive to services, but there are three problems in the application of XA solutions:

Database support for XA is required. Do not use it if you encounter a database that does not support XA (or does not support XA well, such as MySQL prior to 5.7).
Transaction resources (data records, database connections) have long lock cycles due to the constraints of the protocol itself. Long-term resource locking is often unnecessary at the business level, and because the transactional resource manager is the database itself, the application layer cannot intervene. As a result, XA-BASED applications tend to have poor performance and are difficult to optimize.
The floor of the distributed solution based on XA, all depends on the application server of heavyweight (Tuxedo/WebLogic, WebSphere, etc.), it is not applicable to micro service architecture.

A scheme to invade services

In fact, initially there was only one solution for distributed transactions, XA. XA is complete, but in practice, for a variety of reasons (including but not limited to the three points mentioned above), it often has to be abandoned in favor of addressing distributed transactions at the business level. Such as:

Final consistency scheme based on reliable message
TCC
Saga

All fall into this category. The specific mechanism of these schemes is not developed here, and there are many articles on the Internet. In summary, these solutions require that distributed transaction technology constraints be taken into account at the business level of the application, and typically each service needs to be designed to implement forward and reverse idempotent interfaces. Such design constraints often result in high development and maintenance costs.

What would the ideal solution look like?

It is undeniable that the distributed transaction scheme that intrudes into business has been proved by a lot of practices and can effectively solve problems. It plays an important role in the business application system of all walks of life. But back to the origin to think, the adoption of these programs are actually forced. Imagine an XA-BASED solution that was less heavy and could meet the performance requirements of the business. No one would want to take the distributed transaction problem to the business level.

An ideal distributed transaction solution should be: as simple as using local transactions, the business logic only focuses on the requirements of the business level and does not need to consider the constraints of the transaction mechanism.

Principle and Design

We want to design a business-free solution, so think in terms of a business-free XA solution:

Is it possible to evolve on the basis of XA to solve the problems faced by XA solution?

How do I define a distributed transaction?

First, it is natural to think of a distributed transaction as a global transaction that contains several branch transactions. The responsibility of a global transaction is to coordinate the branch transactions under its jurisdiction to agree on either a successful commit together or a failed rollback together. In addition, often a branch transaction is itself a local transaction that satisfies ACID. This is our basic understanding of distributed transaction structures, consistent with XA.

Second, similar to the XA model, we define three components to protocol the processing of distributed transactions.

Transaction Coordinator (TC) : A Transaction Coordinator that maintains the running status of global transactions and coordinates and drives the commit or rollback of global transactions.
Transaction Manager (TM) : Controls the boundaries of a global Transaction, is responsible for starting a global Transaction, and ultimately initiating a global commit or rollback resolution.
Resource Manager (RM) : Controls branch transactions, is responsible for branch registration, status reporting, receives transaction coordinator instructions, and drives the commit and rollback of branch (local) transactions.

A typical distributed transaction process:

TM applies to TC for starting a global transaction. The global transaction is successfully created and a globally unique XID is generated.
The XID is propagated in the context of the microservice invocation link.
RM registers branch transactions with THE TC and brings them under the jurisdiction of the global transaction corresponding to the XID.
TM initiates a global commit or rollback resolution against the XID to TC.
TC schedules all branch transactions under XID to complete commit or rollback requests.

So far, Fescar’s protocol mechanism is generally consistent with XA.

What’s the difference from XA?

Architectural layers

The RM of an XA solution is actually at the database layer, and RM is essentially the database itself (provided with xA-enabled drivers).

Fescar’s RM is deployed on the application side as a middleware layer in the form of a binary package, independent of the database’s own protocol support, and certainly does not require the database to support XA. This is important for microservitization architectures: the application layer does not need to accommodate two different sets of database drivers for different scenarios of local and distributed transactions.

This design removes the requirement of protocol support for database in distributed transaction scheme.

Two-phase commit

Let’s take a look at XA’s 2PC process.

Regardless of whether Phase2’s resolution is COMMIT or ROLLBACK, locks on transactional resources are held until Phase2 is complete.

Given a normal running business, with a high probability that more than 90% of transactions should eventually commit successfully, can we commit local transactions in Phase1? In more than 90% of cases, Phase2 lock time is saved and overall efficiency is improved.

The local lock on the data in the branch transaction is managed by the local transaction and released at the end of the branch transaction Phase1.
At the same time, the connection is released as the local transaction ends.
The global lock on data in a branch transaction is managed on the transaction coordinator side and can be released immediately upon resolution of the Phase2 global commit. The global lock is held until phase E2 of the branch ends only if a global rollback is resolved.

This design, which greatly reduces the locking time of branch transactions on resources (data and connections), provides the foundation for overall concurrency and throughput improvements.

Of course, you must ask: How does Phase1, in the case of commit, roll back Phase2?

How are branch transactions committed and rolled back?

First, the application needs to use the JDBC data source agent of Fescar, which is the RM of Fescar.

Phase1:

The JDBC data source agent of Fescar parses the business SQL, organizes the data mirror of the business data before and after the update into the rollback log, and commits the update of the business data and the write of the rollback log in the same local transaction by taking advantage of the ACID property of the local transaction.

This ensures that any updates to committed business data will have a corresponding rollback log.

Based on this mechanism, the local transaction of the branch can commit Phase1 of the global transaction, immediately releasing the resources locked by the local transaction.

Phase2:

If the resolution is a global commit, and the branch transaction has already committed at this point, there is no need for synchronous coordination processing (just asynchronous cleanup of the rollback log), Phase2 can complete very quickly.

If the resolution is a global rollback, the RM receives the rollback request from the coordinator, finds the corresponding rollback log record by XID and Branch ID, generates the reverse update SQL from the rollback record and executes it to complete the rollback of the Branch.

Transaction propagation mechanism

XID is a unique identification of the global transaction, transaction propagation mechanism to do is to put the XID pass in the service call link, and bind to the service transaction context, in this way, the database update operations in the service link, there will be a registered branch, with the the XID on behalf of the global transaction into the jurisdiction of the same global transaction.

Based on this mechanism, Fescar can support any microservice RPC framework. Just find a mechanism in a particular framework that can transparently propagate xids, such as Dubbo’s Filter + RpcContext.

Corresponding to transaction propagation properties defined by the Java EE specification and Spring, Fescar supports the following:

PROPAGATION_REQUIRED: Default support
PROPAGATION_SUPPORTS: Default support
PROPAGATION_MANDATORY: The application is implemented through the API
PROPAGATION_REQUIRES_NEW: The application is implemented through the API
PROPAGATION_NOT_SUPPORTED: The application is implemented through an API
PROPAGATION_NEVER: The application is implemented through the API
PROPAGATION_NESTED: No support is supported

Isolation,

The isolation of global transactions is based on the local isolation level of branch transactions.

Fescar designs a global write exclusive lock maintained by the transaction coordinator to ensure write isolation between transactions when the database local isolation level reads are committed or above, and defines global transactions at the isolation level of uncommitted reads by default.

Our consensus on the isolation level is that distributed transactions generated by microservices scenarios will work fine for the vast majority of applications at read committed isolation levels. In fact, the vast majority of these scenarios actually work with no problem at the read uncommitted isolation level.

In extreme scenarios, if an application needs to achieve global read commits, Fescar provides a mechanism to do so. By default, Fescar works at read uncommitted isolation levels, ensuring efficiency for most scenarios.

Application Scenario Analysis

There is an important premise in the core principles of Fescar described above: the resource involved in a branch transaction must be a relational database that supports ACID transactions. The commit and rollback mechanisms of branches depend on the guarantee of local transactions. Therefore, if the application uses a database that does not support transactions, or is not a relational database at all, it does not apply.

In addition, the current implementation of Fescar has some limitations, such as transaction isolation level up to read committed level, SQL parsing does not cover the full syntax, etc.

To cover the application scenarios that the Fescar native mechanism does not support at this time, we have defined another working mode.

The Fescar native working mode described above is called Automatic Transaction mode (AT), which is non-intrusive. The corresponding working mode is called MT (Manual Transaction) mode, in which branch transactions need to apply their own logic to define the business itself and commit and rollback.

The basic behavior pattern of branches

A branch transaction that is part of a global transaction contains four behaviors that interact with the coordinator in addition to its own business logic:

Branch registration: Before the data operation of a branch transaction can take place, it is necessary to register the data operation of the branch transaction with the coordinator and put it into the management of a global transaction that has been started. After the branch is registered, the data operation can take place.
Status reporting: After the data operation of a branch transaction completes, the result of its execution needs to be reported to the transaction coordinator.
Branch commit: Completes a branch commit in response to a request from the coordinator for a branch transaction commit.
Branch rollback: Completes the branch rollback in response to a request from the coordinator to roll back a branch transaction.

AT pattern The behavior pattern of the branch

Business logic does not need to focus on transaction mechanism, the branch and global transaction interaction process automatically.

The MT mode branch behavior pattern

The business logic needs to be decomposed into Prepare/Commit/Rollback 3 parts to form an MT branch and join the global transaction.

MT mode is a complement to AT mode. In addition, the value of the MT pattern is that many non-transactional resources can be incorporated into the management of global transactions.

Mixed mode

Because the branches of THE AT and MT modes fundamentally behave in the same way, they are fully compatible, that is, branches of both AT and MT can exist in a global transaction. In this way, comprehensive coverage of business scenarios can be achieved: AT mode is used for those supported by AT mode; If the AT mode is not supported, use MT mode instead. In addition, naturally, non-transactional resources managed by MT can also be managed in the same distributed transaction along with transactional relational database resources.

Vision of application scenarios

Back to our original design: an ideal distributed transaction solution should not intrude on the business. MT mode is a natural complement to the situation where AT mode cannot cover all scenarios AT the moment. We hope that through the continuous evolution and enhancement of AT mode, the supported scenarios will be gradually expanded and MT mode will gradually converge. In the future, we will include native XA support as a non-invasive way to cover scenarios that are unreachable in AT mode.

The extension point

Microservices framework support

The propagation of transaction context between microservices requires customized solutions that are optimal and transparent to the application layer according to the mechanism of the microservice framework itself. Developers interested in building in this area can refer to the built-in support for Dubbo to implement support for other microservices frameworks.

The supported database type

Because AT involves parsing SQL, there are specific adaptations for working on different types of databases. Developers interested in building in this area can refer to the built-in support for MySQL to implement support for other databases.

Configuration and service registry discovery

Support access to different configuration and service registry discovery solutions. Such as Nacos, Eureka, and ZooKeeper.

Scene expansion in MT mode

An important function of THE MT schema is that non-relational database resources can be wrapped in the MT schema branch into the jurisdiction of global transactions. For example, transaction messages for Redis, HBase, RocketMQ, etc. Developers interested in building in this area can contribute a range of ecological adaptations here.

Distributed high availability solutions for transaction coordinators

For different scenarios, different methods are supported as high availability solutions on the Server side of transaction coordinator. For example, persistence for transaction state can be either file-based or database-based. State synchronization between clusters can be based on RPC communication or high availability KV storage.

Roadmap

Lanscape

The green part is already open sourced, the yellow part will open source by Alibaba/AntFinancial, the blue part we want co-building with out community:

Developers can refer to Seata implementation of MySQL support if you want to support different databases transaction
Developers can refer to Seata implementation of Dubbo support if you want to support different microservices
Developers can refer to Seata implementation of TCC support if you want to support different data source(such as MQ, NoSQL)
Developers can refer to Seata implementation of TCC support if you want to support different data source(such as MQ, NoSQL)
Developers can easily support configuration/registry services with just a little work
The blue part is warmly welcome you, join it and contribute excellent solution
We will support XA which is the standard of distributed transaction in our product roadmap

A link to the

FESCAR on GitHub
GTS on Aliyun

blogger

Personal wechat official Account:

Individual making:

github.com/jiankunking

Personal Blog:

jiankunking.com

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Ali distributed transaction solution Fescar analysis

What are the distributed transaction issues brought about by microservitization?

The history of Fescar

designed