Mp.weixin.qq.com/s?__biz=MzU…

What is a transaction

A database Transaction (Transaction for short) is a logical unit of database execution consisting of a limited sequence of database operations.

Transactions have the following four properties, conventionally referred to as ACID properties:

  • Atomicity: The transaction is executed as a whole, and all or none of the operations on the database contained within it are executed.
  • Consistency: Transactions should ensure that the state of the database changes from one consistent state to another. Consistent state means that data in a database should meet integrity constraints. In addition to this, consistency has another layer of semantics, namely that the intermediate state of the transaction cannot be observed (this layer of semantics should also be atomicity).
  • Isolation: When multiple transactions are executed concurrently, the execution of one transaction should not affect the execution of other transactions as if only one operation were executed by the database.
  • Durability: Modifications to the database by committed transactions should be permanently saved in the database. This action is irreversible at the end of the transaction.

Initially, transactions were limited to access control of a single database resource:

With architecture servitization, the concept of transactions extends to services. If a single service operation is treated as a transaction, the entire service operation can only involve a single database resource:

Such transactions, based on access to a single database resource by a single service, are called Local transactions.

Distributed transaction application architecture

Local transactions are primarily limited to a single session and do not involve multiple database resources. However, in the distributed application environment based on SOA (Service-oriented Architecture), more and more applications require access to multiple database resources and multiple services into the same transaction, and distributed transaction arises at the historic moment.

The earliest distributed transaction application architectures were simple and did not involve inter-service access calls, only intra-service operations involving access to multiple database resources.

When a service operation accesses different database resources and wants transactional access to them, a distributed transaction is required to coordinate all transaction participants.

 

For the distributed transaction application architecture described above, although a service operation may access multiple database resources, the entire transaction is controlled within a single service. If a service operation needs to invoke another service, then the transaction needs to span multiple services. In this case, a transaction that starts with one service needs to be transferred to another service as it calls another, so that resources accessed by the called service are automatically added to the transaction. The following figure shows such a distributed transaction across multiple services:

  

If you combine the above two scenarios (a service can call multiple database resources as well as other services) and extend this, the participants of the entire distributed transaction will form a tree topology as shown in the figure below. In a distributed transaction across services, the originator of the transaction is the same as the commit, and it can be the client of the entire invocation or the service that the client invokes first.

  

Compared with local transactions based on access to a single database resource, the application architecture of distributed transactions is more complex. In different distributed application architectures, the issues to be considered in implementing a distributed transaction are not exactly the same, such as the coordination of multiple resources, cross-service propagation of transactions, etc., and the implementation mechanism is also complex and changeable. While there are so many engineering details to consider, the core of distributed transactions is their ACID properties. Therefore, to understand a distributed transaction, start by understanding how it implements transactional ACID properties.

 

Starting with the two most common distributed transaction models, the following will focus on the basic common points of distributed transactions, namely how to ensure ACID properties of distributed transactions.

Common distributed transaction model ACID implementation analysis

X/Open the XA protocol

The earliest Distributed Transaction model is the X/Open Distributed Transaction Processing (DTP) model proposed by the X/Open International Consortium, also known as X/Open XA protocol, or XA protocol for short.

  

The DTP model contains a global Transaction Manager (TM, Transaction Manager) and multiple Resource managers (RM, Resource Managers). The global transaction manager is responsible for managing the global transaction state and participating resources, and committing or rolling back together with resources. The resource manager is responsible for the specific resource operations.

The XA protocol describes the interface between TM and RM, allowing multiple resources to be accessed in the same distributed transaction.

The distributed transaction flow based on DTP model is roughly as follows:

 

  1. The Application (AP, Application) applies to TM to start a global transaction.
  2. For the RM to be operated, THE AP first registers with TM (TM is responsible for recording the RM that the AP has operated, namely branch transactions). TM notifies corresponding RM to enable sub-transactions of distributed transactions through XA interface functions, and then the AP can operate the resources managed by the RM.
  3. When AP completed operations on all RM, AP informed TM to commit or roll back the global transaction according to the execution status, and TM notified each RM to complete the operation through XA interface functions. TM will first require each RM to make pre-submission, and then require each RM to make formal submission after all RM returns successfully. According to XA protocol, once RM’s pre-submission is successful, subsequent formal submission must also be successful. If any RM fails to pre-commit, TM notifies all RMS to roll back.
  4. The global transaction ends when all RM commits or rolls back are complete.

 

atomic

The XA protocol uses the 2PC (Two Phase Commit) atomic Commit protocol to ensure atomicity of distributed transactions.

Two-stage submission refers to the division of the submission process into two phases, namely the preparation phase (voting phase) and the submission phase (implementation phase) :

 

  • Preparation stage

TM sends a prepare message to each RM. If RM’s local transaction operation succeeds, the system returns success. If the local transaction operation of RM fails, failure is displayed.

 

  • The commit phase

If TM receives a success message from all RMS, it sends a commit message to each RMS. Otherwise, a rollback message is sent. RM commits or rolls back the local transaction according to TM instructions to release lock resources used in transaction processing.

 

Isolation,

XA protocol does not describe how to achieve the isolation of distributed transactions, but XA protocol requires each RM in the DTP model to implement local transactions, that is to say, the isolation of distributed transactions implemented based on XA protocol is guaranteed by the isolation of each RM local transactions. When all sub-transactions of a distributed transaction are isolated, the distributed transaction naturally achieves isolation.

 

Take MySQL for example, MySQL uses the two-phase Locking (2PL) mechanism to control the concurrency of local transactions and ensure isolation. 2PL is similar to 2PC in that the lock operation is divided into two phases: lock and unlock, and the two phases are completely disjoint. Lock phase, only lock, do not release the lock. Unlock phase, only put the lock, not lock.



As shown in the figure above, in a local transaction, the lock resource is acquired before each update operation. The operation is performed only when the lock resource is acquired successfully. Once the lock resource is acquired, the lock resource is held until the end of the transaction.

MySQL uses this 2PL mechanism to ensure that during the execution of a local transaction, other concurrent transactions cannot operate on the same resource, thus achieving transaction isolation.

 

consistency

As mentioned earlier, consistency has two layers of semantics. The first layer is to ensure that the database changes from one consistent state to another consistent state after the transaction completes. Another layer of semantics is that intermediate states during transaction execution cannot be observed.

The implementation of the semantics of the previous layer is simple and can be ensured by the implementation of atomicity, isolation, and consistency of RM itself. For the latter level of semantics, let’s first look at how local transactions on a single RM are implemented. Take MySQL for example. The Multi Version Concurrency Control (MVCC) mechanism allows you to create a Snapshot for each consistency state. Each transaction sees the consistency state corresponding to each Snapshot, thus ensuring that the intermediate state of the local transaction is not observed.

Although Snapshot is implemented on a single RM, what are the problems encountered in a distributed application architecture?

  

As shown in the figure above, between the completion of the local subtransaction of RM1 and the completion of the local subtransaction of RM2, only the content executed by the subtransaction on RM1 can be read, but not the subtransaction on RM2. That is, although local transactions on a single RM are consistent, global consistency is broken when the intermediate state of a global transaction execution is observed.

The XA protocol does not define how to implement a global Snapshot. For example, the MySQL official documentation recommends using a serialized isolation level to ensure distributed transaction consistency:

“As with nondistributed the transactions, SERIALIZABLE may be preferred if your applications are sensitive to read phenomena. REPEATABLE READ may not be For distributed transactions, the repeatable read isolation level is not sufficient to ensure transaction consistency. If your program has global consistent read requirements, consider serializing the isolation level.

 

Of course, due to the poor performance of the serialization isolation level, many distributed databases implement their own distributed MVCC mechanism to provide global consistent reads. One basic idea is to have a centralized or logically monotonically increasing control over the generation of a global Snapshot that is fetched once per transaction or per SQL execution to achieve consistency across different isolation levels. For example, Google’s Spanner uses TrueTime to control access to the global Snapshot.

summary

The XA protocol is typically implemented at the database resource layer, directly on the resource manager. Therefore, distributed transaction products based on the XA protocol implementation, whether distributed database or distributed transaction framework, are almost as intrusive to the business as using a normal database.

The XA protocol strictly guarantees transaction ACID characteristics and can meet the functional requirements of all business domains, but it is also a double-edged sword.

Due to the mutual exclusion requirements of isolation, all resources are locked during the execution of a transaction, which is only suitable for short transactions with a fixed execution time. At the same time, as the data is exclusive throughout the transaction, concurrency performance for hot data may be low, which may be improved after the implementation of distributed MVCC or Optimistic locking.

 

To ensure consistency, all RMS must be trusted and reliable, and the fault recovery mechanism must be reliable and fast. When network faults are isolated, services are unavailable.

 

TCC model

Compared with XA and other traditional models, TCC (try-confirm-Cancel) distributed transaction model is characterized in that it does not rely on resource manager (RM) to support distributed transaction, but implements distributed transaction through decomposition of business logic.

 

TCC model holds that for a specific business logic in a business system, it must accept some uncertainties when providing services externally, that is, the invocation of the initial operation of the business logic is only a temporary operation, and the main business service that invokes it reserves the right to cancel subsequent operations. If the main business service thinks that the global transaction should be rolled back, it will ask for the previous temporary operation to be cancelled, which is the case for the cancelled operation from the business service. When the master business service decides that the global transaction should be committed, it waives the cancellation right of the previous temporary operation, which should be performed by the confirmation operation of the business service. Every initial action will eventually be confirmed or cancelled.

 

Therefore, for a specific business service, the TCC distributed transaction model requires the business system to provide three pieces of business logic:

  1. Initial operation Try: Complete all service checks and reserve necessary service resources.
  2. Confirm operation Confirm: indicates that the service logic is executed without any service check and only the service resources reserved during the Try phase are used. Therefore, Confirm must succeed as long as the Try operation succeeds. In addition, the Confirm operation must be idempotent to ensure that a distributed transaction can succeed only once.
  3. Cancel operation Cancel: Releases service resources reserved during the Try phase. Similarly, the Cancel operation needs to be idempotent.

  

TCC distributed transaction model consists of three parts:

  • Master Business service: The master business service is the initiator of the entire business activity, the service organizer, responsible for initiating and completing the entire business activity.
  • Slave service: As a participant of the entire business activities, slave service is responsible for providing TCC business operations and realizing the three interfaces of Try, Confirm and Cancel, which are invoked by the main business service.
  • Business Activity Manager: The Business activity manager manages and controls the entire business activity, including recording the transaction state that maintains the TCC global transaction and the sub-transaction state of each slave business service, and invoking all Confirm operations from the slave business service when the business activity commits and all Cancel operations from the slave business service when the business activity cancels.

 

A complete TCC distributed transaction flow is as follows:

1. The primary service starts local transactions first.

2. The master business service applies to the business activity manager to start the distributed transaction master business activity;

3. The master business activity registers the slave business activity with the business activity manager for the slave business service to be invoked, and then invokes the Try interface of the slave business service.

4. When all the Try interfaces of the secondary service are successfully invoked, the primary service submits the local transaction. If the invocation fails, the master business service rolls back the local transaction.

5. If the master business service commits a local transaction, the TCC model invokes the Confirm interface of all slave business services. If the main business service rolls back the local transaction, the Cancel interface is called separately.

6. After all Confirm or Cancel operations from the service are complete, the global transaction ends.

atomic

The TCC model also uses the 2PC atomic commit protocol to ensure atomicity of transactions. The Try operation corresponds to the Prepare of 2PC. Confirm corresponds to Commit and Cancel corresponds to Rollback of 2PC. TCC is 2PC at the application layer.

Isolation,

TCC distributed transaction model only provides two-phase atomic commit protocol to ensure atomicity of distributed transactions. The isolation of transactions is left to the business logic.

The essence of isolation is to control concurrency and prevent results from being distorted by concurrent transactions operating on the same resource.

For example, the financial industry manages user funds. When a user initiates a transaction, it generally checks the user’s funds first. If the funds are sufficient, the corresponding transaction amount will be deducted to increase the seller’s funds and complete the transaction. If there is no transaction isolation, the user initiates two transactions at the same time, and the checks for both transactions believe that there is sufficient funds, but in fact there is only enough money to pay for one transaction. As a result, both transactions are paid successfully, resulting in capital loss.

 

Can be found, concurrency control is the guarantee of business logic executed correctly, but like two phase lock has held concurrent access control technology requirements database resource lock until end of the whole transaction execution, especially within the framework of a distributed transaction, to hold locked to the end of the second phase distributed transactions performed, that is, a distributed transaction will lengthen resource locks, holding time, Concurrency performance deteriorates further.

 

Therefore, the idea of isolation of TCC model is that after the end of the first phase of business transformation, the locking of the underlying database resources is transferred to the locking of the upper business level, so as to release the underlying database locking resources, relax the distributed transaction locking protocol, and improve the performance of business concurrency.

Or the above examples:

1. Stage 1: Check the user’s funds. If the funds are sufficient, freeze the user’s funds for this transaction, which are isolated by the business and not allowed to be used by other concurrent transactions.

2. Stage 2: Deduct user funds pre-frozen in stage 1 and increase seller funds to complete the transaction.

 

Business lock is adopted to isolate users and freeze funds. After the end of the first phase, the underlying resource lock is directly released. Other transactions of the user and the seller can be executed concurrently immediately, without waiting until the end of the whole distributed transaction, and higher concurrent transaction ability can be obtained.

consistency

Let’s look at the implementation of consistency under the TCC distributed transaction model. Similar to the XA protocol consistency level 1 semantics, the consistency state transformation of distributed transactions is realized through atomicity to ensure the atomic submission of transactions and business isolation to control the concurrent access of transactions.

As for the second level of semantics: the intermediate state of a transaction cannot be observed. Let’s see if this is necessary in an SOA distributed application environment.

Again, take accounting services. The transfer service (user A -> user B) is composed of transaction service and account service as A distributed transaction. The transaction service serves as the main business service, and the account service serves as the secondary business service. The Try operation of account service pre-freezes user A’s funds. The Commit operation deducts user A’s pre-frozen funds and increases user B’s available funds. The Cancel operation unfreezes the pre-frozen funds of user A.

After the account service completes the Try phase, the transaction master business can Commit, and the TCC framework calls the Commit phase of the account. Before the end of the accounting Commit phase, user A can check that his balance has been deducted, but user B’s available funds have not increased at this time.

From a systems point of view, there are problems and uncertainties. There is a time lag between the end of phase 1 execution and the end of phase 2 execution, during which it appears that no user has access to the asset.

However, from the user’s point of view, this interval may not matter or exist at all. Especially if the time interval is only a few seconds, the process is covert or indeed acceptable to the user communicating the asset transfer, and the final consistency of the results is guaranteed.

Of course, for such a system, if you really need to see a certain consistency state of the system, you can do so in an additional way.

Services, in general, the consistency between the weakening is easier than the internal consistency of the service, which is why the XA, directly on the resource level to realize universal distributed transaction model will pay attention to the guarantee of consistency, and when to rise to the service level, service and services, has achieved the division of functions between the logic of decoupling, is easier, weakening the consistency, This is the ultimate consistency idea of BASE theory under SOA architecture.

BASE theory refers to Basic Availability (BA). S (Soft state); E (Eventual consistency). The theory holds that for the sake of availability, performance, and degraded services, the consistency requirement can be appropriately lowered, that is, “basically available, eventually consistent”.

Transactions that adhere strictly to ACID are often referred to as rigid transactions; The BASE – based transaction is called flexible transaction. Flexible transactions do not completely abandon ACID, but merely relax consistency requirements: consistency after completion of a transaction is strictly followed, and consistency within a transaction can be relaxed.

summary

The business implementation characteristics of TCC distributed transaction model determine that it can realize resource management across DB and across services, and coordinate different DB access and different business operations into one atomic operation through TCC model, which solves the transaction problem in the scenario of distributed application architecture.

TCC model ensures atomicity of distributed transactions through 2PC atomic submission protocol, and raises the isolation of resource layer to business layer, which is realized by business logic. For the resource layer, each operation of TCC is the use of a single local transaction. When the operation ends, the local transaction ends, avoiding the problem of low performance caused by resource layer resource occupation under 2PC and 2PL.

At the same time, TCC model can also do some customized functions according to business needs, such as transaction asynchronization to achieve peak load cutting and valley filling.

However, the service access TCC model needs to split the service logic into two phases and implement the Try, Confirm, and Cancel interfaces, which is highly customized and costly to develop.

conclusion

This article begins by introducing a typical distributed transaction architecture scenario. Distributed transactions were originally born to solve the scenario of a single service with multiple database resources. With the development of technology, especially the arrival of SOA distributed application architecture and microservice era, service has become a basic business unit. As a result, there is a need for distributed transactions across services.

Then, this paper introduces the implementation mechanism of XA and TCC, and analyzes how each model implements distributed transaction ACID.

In the next article, Distributed transaction Solutions and Application Scenario Analysis, the capabilities of various distributed transaction solutions in multiple aspects will be comprehensively compared and analyzed in combination with some actual business scenarios, so as to help readers better understand each distributed transaction model and application scenario.

Mp.weixin.qq.com/s?__biz=MzU…