preface

Welcome to our GitHub repository Star: github.com/bin39232820… The best time to plant a tree was ten years ago, followed by now

Distributed transaction

1 Basic Concepts

1.1 What are transactions

What is a transaction? Take an example in life: when you go to a small shop to buy something, “hand in the money, hand in the delivery” is an example of a transaction. Only when both the money and the delivery are successful, the transaction will be considered successful. If any activity fails, the transaction will cancel all successful activities. With the above example in mind, let’s look at the definition of a transaction:

A transaction can be thought of as a large activity that consists of different smaller activities that either all succeed or all fail.

1.2 Local Transactions

In computer systems, more is through the relational database to control the transaction, the transaction characteristics of the database itself is used to realize, so called data library affairs, due to the application mainly depends on the relational database to control the transaction, and the database and application on the same server normally, so based on relational database transaction is also known as a local transaction.

To review the four properties of database transactions, ACID:

  • A (Atomic) : atomicity. All operations that make up A transaction are either completed or not executed. There can be no partial success and partial failure.
  • C (Consistency) : Consistency. Before and after a transaction is executed, the Consistency constraint of the database is not damaged. For example, if Zhang SAN transfers 100 yuan to Li Si, the data before and after the transfer are in the correct state, which is called consistency. If Zhang SAN transfers 100 yuan, Li Si’s account does not increase 100 yuan, which is a data error, and the consistency is not achieved.
  • I (Isolation) : Isolation. Transactions in a database are generally concurrent. Isolation means that the execution of two concurrent transactions does not interfere with each other, and one transaction cannot see the intermediate status of other transactions. You can configure the transaction isolation level to avoid dirty reads and duplicate reads.
  • D (Durability) : Specifies that changes made to data by a transaction are persisted to the database and cannot be rolled back after the transaction completes.

Database transactions will be implemented in a transaction involved in all operations into an indivisible execution unit, all operations in the execution unit will either succeed or fail, as long as any of the operations fail, will lead to the whole transaction rollback

1.3 Distributed Transactions

With the rapid development of the Internet, software systems have changed from single application to distributed application. The following figure describes the single application

+ The evolution of using to microservice: distributed system will split an application system into multiple services that can be independently deployed, so remote cooperation between services is required to complete the transaction operation. In this distributed system environment, the transaction is completed by remote collaboration between different services through the networkDistributed transaction, such as user registration and points sending transaction, order creation and inventory reduction transaction, bank transfer transaction are all distributed transactions.

We know that local transactions rely on transaction features provided by the database itself, so the following logic can control local transactions:

Begin the transaction; SQL > commit transation; SQL > commit transation;Copy the code

But in a distributed environment, it looks like this:

Begin the transaction; //2. Remote call: commit transation;Copy the code

It can be assumed that when the remote call succeeded in increasing the amount of money, the remote call did not return due to network problems. At this time, the local transaction failed to commit, and the operation of reducing the amount of money was rolled back. At this time, the data of John and John are inconsistent. Therefore, on the basis of distributed architecture, the traditional database transactions can not be used, the accounts of John and John are not in the same database or even in the same application system, and the transfer transactions need to be implemented through remote call, which will lead to distributed transaction problems due to network problems.

1.4 Generation scenario of distributed transactions

  1. A typical scenario is a microservice architecture where transactions are performed between microservices through remote calls. For example, order micro-service and inventory micro-service. When placing an order, order micro-service requests inventory micro-service to reduce inventory. In short: distributed transactions occur across JVM processes.

2. Single system accessing multiple database instances Distributed transactions occur when single systems need to access multiple databases (instances). For example, user information and order information are stored in two MySQL instances. When the user management system deletes user information, it needs to delete user information and user order information respectively. Because the data is distributed in different data instances, it needs to operate the data through different database links, and then distributed transactions occur. In short: generate distributed transactions across database instances.

3. Multiple services accessing the same database instance, such as order microservice and inventory microservice, will generate distributed transactions even if they access the same database. The reason is that, across the JVM process, two microservices hold different database links for database operations, and then distributed transactions are generated.

2.1 Basic theory of distributed transactions

We learned about the basic concepts of distributed transactions. Unlike local transactions, distributed systems are called distributed because the nodes that provide services are distributed on different machines and interact with each other through the network. The whole system cannot provide services because of a few network problems. Network factors become one of the criteria for the consideration of distributed transactions. Therefore, distributed transactions need further theoretical support. Next, we will first study the CAP theory of distributed transactions.

2.1.1 CAP theory

CAP is the abbreviation of Consistency, Availability, and Partition tolerance, respectively. Let’s explain them separately:

In order to facilitate the understanding of CAP theory, we combine some business scenarios in e-commerce system to understand CAP.

The following figure shows the execution process of commodity information management:

The overall execution process is as follows:

1. The commodity service requests the master database to write commodity information (add commodity, modify commodity, delete commodity). 2. 3. The commodity service request reads the commodity information from the database.Copy the code

C – Consistency:

Consistency means that the read operation after the write operation can read the latest data status. If the data is distributed on multiple nodes, the data read from any node is the latest data status

In the figure above, the consistency of reading and writing of commodity information is to achieve the following objectives:

1. If writing goods and services to the master database succeeds, querying new data from the slave database succeeds. 2. If writing goods and services to the master database fails, querying new data from the slave database also fails.Copy the code

How to achieve consistency?

1. After writing data to the primary database, synchronize data to the secondary database. 2. After writing data to the master database, lock the slave database during the synchronization to the slave database. Release the lock after the synchronization is complete to avoid querying the old data from the slave database.Copy the code

Characteristics of distributed system consistency:

1. Due to the process of data synchronization, the response of write operations will be delayed. 2. To ensure data consistency, resources are temporarily locked and released after data synchronization is complete. 3, if the request data synchronization failure of the node will return an error message, must not return old data.Copy the code

A – the Availability:

Availability means that any transaction operation can get response results without response timeouts or response errors. In the figure above, product information reading to satisfy availability is intended to achieve the following objectives

1. The data query request received from the database can immediately respond to the data query result. 2. The slave database does not allow response timeouts or errors.Copy the code

How is availability achieved?

1. After writing data to the primary database, synchronize data to the secondary database. 2. To ensure the availability of the slave database, do not lock resources in the slave database. 3, the instant data has not been synchronized, from the database to return the data to query, even if the old data, if there is no old data can be returned in accordance with the convention of a default information, but can not return an error or response timeout.Copy the code

Characteristics of distributed system availability:

1. All requests are responded, and no response timeout or error occurs.Copy the code

P-partition tolerance:

Usually, each node of a distributed system is deployed in different subnets, which is network partition. It is inevitable that communication between nodes will fail due to network problems, but they can still provide services to the outside, which is called partition tolerance.

In the figure above, reading and writing of commodity information to meet partition tolerance is to achieve the following goals:

1. The failure to synchronize data from the primary database to the secondary database does not affect read and write operations. 2. The failure of one node does not affect the external service of the other node.Copy the code

How to achieve partition tolerance?

1. Try to use asynchrony instead of synchronous operations, such as asynchronously synchronizing data from the primary database to the slave database, so that the nodes can effectively achieve loose coupling. 2, add a slave database node, one of the slave node to suspend the other slave node to provide service.Copy the code

Characteristics of distributed partition tolerance:

1. Partition tolerance is the basic ability of the cloth system.Copy the code

2.1.2 CAP combination mode

1. Does the above commodity management example also have CAP?

All distributed transaction scenarios do not have the three features of CAP at the same time, because C and A cannot coexist under the premise of P

For example, if P is satisfied in the following figure, partition tolerance is achieved:

The meanings of partition tolerance in this figure are:

1) The master database synchronizes data to the slave data through the network. It can be considered that the master and slave databases are deployed in different partitions and interact with each other through the network. 2) When the network between the master database and the slave database is faulty, the external services provided by the master database and the slave database are not affected. 3) The failure of one node does not affect the external service of the other node.Copy the code

To implement C, data consistency must be ensured. During data synchronization, data from the secondary database must be locked to prevent inconsistent data from being queried from the secondary database. After the synchronization is complete, data from the secondary database must be unlocked.

If A is to be implemented, data availability must be guaranteed, so that data can be queried from slave data at any time and no response times out or error messages are returned.

Through analysis, it is found that C and A are contradictory on the premise that P is satisfied.

2. What are the combinations of CAP?

Therefore, when dealing with distributed transactions in production, it is necessary to determine which two aspects of CAP are satisfied according to the requirements.

1) AP: Abandon consistency in favor of partition tolerance and availability. This is the design choice of many distributed systems

For example: the above commodity management, can achieve AP, the premise is as long as the user can accept the query to the data within a certain period of time is not the latest. Generally, AP will guarantee the final consistency. The BASE theory described later is extended based on AP. Some business scenarios include order refund, successful refund today, account arrival tomorrow, as long as the user can accept the account arrival within a certain period of time.

2) CP:

Give up availability and pursue consistency and fault tolerance of partition. Our ZooKeeper actually pursues strong consistency. Another example is inter-bank transfer.

3) CA:

Consistency and availability can be achieved by giving up partition tolerance, that is, not partitioning and not considering problems due to network failure or node failure. The system would not be a standard distributed system, and the relational data we use most often would satisfy the CA.

The above commodity management, if you want to implement CA architecture is as follows:

There is no data synchronization between the master database and the slave database. The database can respond to each query request, and each query request can return the latest data through the transaction isolation level.

2.1.3 summary

Through the above, we have learned relevant knowledge of CAP theory, CAP is a proven theory: A distributed system can satisfy at most two of Consistency, Availability, and Partition tolerance at the same time. It can be used as a consideration standard for our architecture design and technology selection. For most large-scale Internet application scenarios, there are many nodes, scattered deployment, and the current cluster scale is getting larger and larger, so node failure and network failure are normal, and service availability should be guaranteed to reach N 9 (99.99.. %), and to achieve good response performance to improve user experience, the following choices are generally made: ensure P and A, abandon C strong consistency to ensure final consistency.

2.2 the BASE theory of

Understand strong consistency and ultimate consistency

CAP theory tells us that a distributed system can only meet at most two of Consistency, Availability and Partition tolerance at the same time. AP is more common in practical applications, and AP means to abandon Consistency. Ensure availability and partition tolerance, but in actual production, consistency should be realized in many scenarios. For example, in the previous example, the master database synchronizes data to the slave database. Even if there is no consistency, data must be successfully synchronized to ensure data consistency, which is different from consistency in CAP. Consistency in CAP requires that the data of each node must be consistent when queried at any time. It emphasizes strong consistency, but the final consistency allows inconsistent data of each node within a period of time. However, after a period of time, the data of each node must be consistent, which emphasizes the consistency of final data.

2. Introduction to Base theory

BASE is an abbreviation of Basically Available, Soft state, and Eventually consistent. BASE theory is an extension of AP in CAP, which achieves availability by sacrificing strong consistency. When failure occurs, part of the data is allowed to be unavailable but core functions are ensured to be available. Data is allowed to be inconsistent for a period of time, but eventually reaches a consistent state. The transactions satisfying BASE theory are called “flexible transactions”.

  • Basic availability: When a distributed system fails, it allows the loss of some available functions to ensure the availability of core functions. For example, e-commerce website transaction payment problems, goods can still be viewed normally.
  • Soft state: Since strong consistency is not required, BASE allows the existence of intermediate states (also called soft states) in the system. This state does not affect system availability, such as “payment in progress”, “data synchronization in progress”, etc. After data consistency, the state will be changed to “success”.
  • Final consistency: Indicates that data on all nodes will be consistent after a period of time. For example, the “payment in progress” status of the order will eventually change to “payment success” or “payment failure”, so that the order status is consistent with the actual transaction result, but it needs a certain time delay and waiting.

2PC for Distributed Transaction Solution (Two-phase Commit)

3.1 What is 2PC

2PC is a two-phase commit protocol, which divides the entire transaction process into two phases: Prepare Phase and Commit phase. 2 refers to two phases, P refers to the preparation phase, and C refers to the commit phase.

Example: Tom and Tom have not seen each other for a long time. Their old friends are having a dinner party. At this time, Zhang SAN and Li Si respectively complain about the current situation is not satisfactory, poor money, are not willing to treat, then only AA. The boss can arrange the meal only if John and Lisa pay. But since Both Zhang and Li are iron-willed, it creates an awkward situation:

Preparation stage: the boss demands payment, payment. The boss demands payment from Li4. Li4 pays.

Submission stage: the boss gives the ticket, and two people take the ticket and sit down for dinner.

The example creates a transaction in which if either Joe or Joe refuses to pay, or if there is not enough money, the shop owner will not give the ticket and will return the money received.

The whole transaction process is composed of transaction manager and participants. The shop owner is the transaction manager, John and John are the transaction participants. The transaction manager is responsible for making the submission and rollback of the entire distributed transaction, and the transaction participants are responsible for the submission and rollback of their own local transaction

3.2 Solutions

3.2.1 XA scheme

The traditional 2PC scheme is implemented at the database level. For example, Oracle and MySQL all support 2PC protocol. In order to unify standards and reduce unnecessary connection costs in the industry, it is necessary to develop standardized processing model and interface standards. The International Open Group defines the Distributed Transaction Processing Reference Model (DTP).

In order to make the content of XA program more clear, the following new user registration points as an example to illustrate:

The execution process is as follows:

1. The application program (AP) holds two data sources, the user database and the integral database. 2. The application program (AP) notified the user database RM of adding a user through TM and the integration database RM of adding points for the user. At this time, RM did not submit the transaction, and the user and the integration resources were locked. 3. After receiving the execution reply, TM will initiate rollback transaction to other RM if either party fails. After rollback, resource lock will be released. 4. After receiving the execution reply, TM successfully initiates the submission transaction to all RM. After the submission, the resource lock is released.Copy the code

The DTP model defines the following roles:

  • Application Program (AP) : an Application Program that uses DTP distributed transactions.
  • Resource Manager (RM) : refers to the participant of a transaction. Generally, it refers to a database instance. The Resource Manager controls the database, and the Resource Manager controls branch transactions
  • Transaction Manager (TM) : A Transaction Manager that coordinates and manages transactions. The Transaction Manager controls global transactions, manages Transaction lifecycles, and coordinates RMS. Global transaction refers to the distributed transaction processing environment, need to operate multiple databases to complete a work, this work is a global transaction.
  • DTP model defines the interface specification for communication between TM and RM called XA, which is simply understood as 2PC interface protocol provided by database. The realization of 2PC based on DATABASE XA protocol is also called XA scheme.
  • The interaction between the three roles is as follows:
    • TM provides an application programming interface to the AP through which the AP commits and rolls back transactions.
    • TM transaction middleware notifies RM of database transaction start, end, commit, rollback, etc., through XA interface
    • Conclusion:
    • The whole 2PC transaction process involves three roles AP, RM and TM. AP refers to applications that use 2PC distributed transactions; RM stands for resource manager, which controls branch transactions; TM refers to the transaction manager, which controls the entire global transaction.

1) In the preparation stage, RM performs actual business operations, but does not submit transactions, and resources are locked;

2) In the commit phase, TM will accept the execution reply from RM in the preparation phase. As long as any RM fails to execute, TM will notify all RM to perform the rollback operation; otherwise, TM will notify all RM to submit the transaction. The commit phase ends and the resource lock is released.

Problems with XA schemes:

1. The local database needs to support XA. 2. The resource lock is not released until the end of the two phases, resulting in poor performance.Copy the code

3.2.2 Seata scheme

Seata is an open source project Fescar initiated by Ali middleware team, later renamed Seata, which is an open source distributed transaction framework

Seata solves the problem of traditional 2PC by coordinating branch transactions of local relational databases to drive the completion of global transactions. Seata is middleware that works at the application layer. The main advantage is that it has good performance and does not occupy connection resources for a long time. It solves the problem of distributed transaction in micro-service scenarios in an efficient and zero-intrusion way. AT present, it provides distributed transaction solutions in AT mode (2PC) and TCC mode.

Seata’s design philosophy is as follows

One of Seata’s design goals is to be non-intrusive, so it starts with a non-intrusive 2PC solution, evolves on the basis of the traditional 2PC solution, and solves the problems faced by the 2PC solution.

Seata understands a distributed transaction as a global transaction that contains several branch transactions. The responsibility of a global transaction is to coordinate the branch transactions under its jurisdiction to agree on either a successful commit together or a failed rollback together. In addition, the branch transaction itself is usually a local transaction of a relational database. The following diagram shows the relationship between global transactions and branch transactions:

Similar to the traditional 2PC model, Seata defines three components to protocol the processing of distributed transactions:

  1. A Transaction Coordinator (TC) is an independent middleware that needs to be deployed and run independently. It maintains the running status of global transactions, receives TM instructions to initiate the submission and rollback of global transactions, and communicates with RM to coordinate the submission or rollback of branch transactions.
  2. Transaction Manager (TM) : A Transaction Manager that needs to be embedded in the application to work, which is responsible for starting a global Transaction and ultimately issuing a global commit or global rollback instruction to the TC.
  3. Resource Manager (RM) : Controls branch transactions, is responsible for branch registration and status reporting, receives instructions from the transaction coordinator TC, and drives the submission and rollback of branch (local) transactions.

Seata’s distributed transaction process is also given as an example of how new users register and send points

The specific execution process is as follows:

1. The TM of the user service applies to the TC for starting a global transaction. The global transaction is successfully created and a globally unique XID is generated. 2. The RM of the user service registers a branch transaction with the TC. The branch transaction executes the new user logic in the user service and brings it into the jurisdiction of the global transaction corresponding to the XID. 3. The user service performs a branch transaction to insert a record into the user table. 4. The logic is executed until the integral service is called remotely (the XID is propagated in the context of the microservice invocation link). The RM of the integral service registers the branch transaction with TC, which performs the logic of adding points and brings it into the jurisdiction of the XID corresponding global transaction. 5. The integral service performs branch transactions, inserts a record into the integral record table, and returns to the user service after completion of execution. 6. The branch transaction is complete. 7. TM initiates a global commit or rollback resolution against XID to TC. 8. TC schedules all branch transactions under XID to complete commit or rollback requests.Copy the code

Seata implementation of 2PC and traditional 2PC differences:

In terms of architecture level, the RM of the traditional 2PC scheme is actually in the database layer, and RM is essentially the database itself, implemented through XA protocol, while THE RM of Seata is deployed in the application program side in the form of JAR package as the middleware layer.

In terms of two-phase commit, traditional 2PC holds locks on transactional resources until Phase E2 is complete, regardless of whether the second phase resolution is COMMIT or ROLLBACK. Instead, Seata commits the local transaction at Phase1, which saves Phase2 from being locked and improves overall efficiency.

3.3 summary

This section explains two 2PC solutions of traditional 2PC (based on database XA protocol) and Seata. Seata is recommended to implement 2PC because it is intrusive and solves the problem of long-term resource lock in traditional 2PC.

Seata implementation 2PC essentials:

1. Global transactions begin using the @GlobalTransactional identifier. Each local transaction scheme still uses the @transactional identifier. 3. The undo_log table needs to be created for each data. This table is the key for SEATA to ensure local transaction consistencyCopy the code

TCC for distributed transaction solutions

4.1. What are TCC transactions

TCC stands for Try, Confirm, and Cancel. TCC requires each branch transaction to perform three operations: preprocess a Try, Confirm, and Cancel. The Try operation performs service check and resource reservation, Confirm operation performs service confirmation, and Cancel performs rollback operations opposite to the Try operation. TM will initiate the try operation of all branch transactions first. If the try operation of any branch transaction fails, TM will initiate the Cancel operation of all branch transactions. If the try operation succeeds, TM will initiate Confirm operation of all branch transactions. If Confirm/Cancel fails, TM will retry.

TCC is divided into three phases:

  1. In the Try phase, services are checked (consistency) and resources are reserved (isolation). This phase is only a preliminary operation, which, together with Confirm, constitutes a complete service logic.
  2. In the Confirm phase, Confirm is committed. In the Try phase, Confirm is executed after all branch transactions are successfully executed. In general, TCC assumes that the Confirm phase is infallible. If the Try succeeds, Confirm succeeds. If an error occurs during the Confirm phase, a retry mechanism or manual processing is required.
  3. In the Cancel phase, a branch transaction is cancelled and reserved resources are released when a service execution error needs to be rolled back. In general, the Cancel phase is considered to be successful if TCC is used. If the Cancel phase does go wrong, a retry mechanism or manual processing should be introduced.
  4. TM Transaction Manager A TM transaction manager can be implemented as an independent service or a global transaction initiator can act as a TM. TM is isolated to become a common component for system architecture and software reuse

TM in a global transaction generated when global transaction records, throughout the distributed transaction chain, global transaction ID is used to record a transaction context, track and record the status, the need to retry failed due to Confirm and cancel, so you need to realize is idempotent, idempotence refers to the same operation no matter how many times the request, the results are the same

4.2 TCC Solution

Currently, there are many TCC frameworks on the market, such as the following ones: (Data collection date: November 23, 2019)

Name of the framework Gitbub address Number of star
tcc-transaction Github.com/changmingxi… 3850
Hmily Github.com/yu199195/hm… 2407
ByteTCC Github.com/liuyangming… 1947
EasyTransaction Github.com/QNJR-GROUP/… 1690

Seata also supports TCC, but the TCC mode of Seata does not support Spring Cloud. Our goal was to understand the rationale for TCC and the process by which transaction coordination works, so we preferred a lightweight and easy-to-understand framework, hence Hmily.

Hmily is a high performance distributed transaction TCC open source framework. Based on Java language to develop (JDK1.8), support Dubbo, Spring Cloud and other RPC framework for distributed transactions. It currently supports the following features:

Disruptor framework is used to asynchronously read and write transaction logs, providing the same performance as RPC framework. Springboot-starter project startup is supported. Disruptor framework is used to asynchronously read and write transaction logs. Simple to use - RPC framework support: Dubbo, Motan, SpringCloud. - storage support local affairs: redis, directing, zookeeper, file, mysql. - Transaction log serialization support: Java, Hessian, Kryo, Protostuff - Seamless integration with Spring with Aspect AOP Aspect thinking, natural clustering support. - RPC transaction recovery, timeout exception recovery, etcCopy the code

Hmily uses AOP to intercept local methods and remote methods involved in distributed transactions. Through multi-party interception, transaction participants can transparently call the other party’s Try, Confirm and Cancel methods. Passing transaction context; And record transaction log, compensation, retry, etc.

Hmily don’t need a transaction coordination service, but there is a need to provide a database (mysql/mongo/zookeeper/redis/file) for log saved.

The TCC service implemented by Hmily, like a normal service, only needs to expose one interface, its Try business. The Confirm/Cancel business logic is provided only for global transaction commit/rollback purposes, so the Confirm/Cancel business needs to be discovered only by the Hmily TCC transaction framework, not by other business services that invoke it.

The official website: dromara.org/website/zh-…

TCC needs to be aware of three types of exception handling: empty rollback, idempotent, and suspension

Empty rollback:

The two-phase Cancel method is called without calling the TCC resource Try method, which needs to recognize that this is an empty rollback and return success directly.

The reason is that when the service of a branch transaction breaks down or the network is abnormal, the call of the branch transaction is recorded as failure. At this time, the Try stage segment is not executed. When the fault is recovered, the two-stage Cancel method will be called for the rollback of the distributed transaction, thus creating an empty rollback.

The idea is that the key is to identify the empty rollback. The idea is simple: you need to know if a phase is executed, and if it is, it is a normal rollback; If it is not executed, it is an empty rollback. As mentioned earlier, TM generates a global transaction record when it initiates a global transaction, and the global transaction ID runs through the entire distributed transaction invocation chain. Add an additional branch transaction record table with the global transaction ID and branch transaction ID, and insert a record in the first phase Try method to indicate that the first phase has been executed. The Cancel interface reads the record. If the record exists, it is rolled back normally. If the record does not exist, the rollback is null.

Idempotent:

As we have learned from the previous introduction, to ensure that TCC’s two-phase commit retry mechanism does not cause data inconsistencies, TCC’s two-phase Try, Confirm, and Cancel interfaces are idempotent so that resources are not reused or released. If idempotent control is not done well, serious problems such as data inconsistencies can result.

Add the execution status to Branch Transaction Record and query the status before each execution

Suspension:

Suspension is when, for a distributed transaction, the two-stage Cancel interface executes before the Try interface

The reason is that when RPC calls the branch transaction try, the branch transaction is registered first and then the RPC call is executed. If the network of RPC calls is congested at this time, usually the RPC call has a timeout time. After THE RPC times out, TM will inform RM to roll back the distributed transaction. Only when the RPC request reaches the participant and is actually executed can the business resources reserved by a Try method be used only by the distributed transaction. The business resources reserved in the first stage of the distributed transaction are no longer able to be processed. In this case, we call it suspended, that is, the business resources cannot be processed after being reserved.

The solution is that if the two-phase execution is complete, the two-phase execution cannot be continued. When executing a one-phase transaction, determine whether there are already two-phase transaction records in the Branch Transaction Records table under the global transaction. If there are, do not execute a Try.

For example, the scenario is that A transfers 30 yuan to B, and A and B account are in different services

Plan 1:

Account A

Check to see if the balance is sufficient for $30 Subtract $30 confirm add $30Copy the code

Account B

Try: add 30 yuan confirm: empty cancel: subtract 30 yuanCopy the code

Description of Option 1:

1) Account A, the balance here is the so-called business resources. According to the principle mentioned above, the business resources need to be checked and reserved in the first stage. Therefore, we first check whether the balance of account A is enough in the Try interface of TCC resources, if so, 30 yuan will be deducted. The Confirm interface is formally committed. Since the service resource has been removed from the Try interface, there is no need to do anything in the Confirm interface at the second stage. The transaction is rolled back by executing the Cancel interface. If account A is rolled back, the $30 deducted from the Try interface is returned to the account.

2) Account B is added to account B in the Try interface of the first stage. The execution of Cancel interface indicates the rollback of the whole transaction. The rollback of account B requires subtracting the 30 yuan added to the Try interface.

Problem analysis of Scheme 1:

1) If account A's try is not executed at Cancel, $30 will be added. 2) Since try, Cancel and confirm are called by separate threads and repeated calls occur, they all need to be idempotent. 3) Account B adds $30 to the try, which may be consumed by other threads after the try execution completes. 4) If account B's try is not executed at Cancel, then 30 yuan more is deducted.Copy the code

Problem solving:

1) The cancel method of account A needs to check whether the try method is executed, and cancel can be executed only after the try method is executed normally. 2) The try, cancel and confirm methods are idempotent. 3) Account B is not allowed to update the account amount in the try method, but is allowed to update the account amount in confirm. 4) The cancel method of account B needs to check whether the try method is executed, and cancel can be executed only after the try method is executed normally.Copy the code

Optimization scheme:

Account A

Idempotent check try suspend processing check balance for $30 deduct $30 confirm empty cancel idempotent check cancel Empty rollback processing increase available balance for $30 ````Copy the code

Account B

Try: empty confirm: confirm Idempotent check officially adds 30 yuan Cancel: emptyCopy the code

4.3 summary

If the TCC transaction processing flow is compared with the 2PC two-phase commit, 2PC is usually handled at the DB level across libraries, while TCC is handled at the application level, which needs to be implemented by business logic. The advantage of this distributed transaction implementation is that applications can define their own granularity of data operations, making it possible to reduce lock conflicts and improve throughput.

However, the disadvantage is that the application is very intrusive, and each branch of the business logic needs to implement the try, confirm, and cancel operations. In addition, it is difficult to implement. Different rollback strategies need to be implemented based on different failure causes, such as network status and system faults.

Reliable message final consistency for distributed transaction solutions

###5.1 What is a reliable message ultimate consistency transaction

The scheme of reliable message final consistency refers to that when the transaction initiator sends a message after completing the local transaction, the transaction participants (message consumers) must be able to receive the message and process the transaction successfully. This scheme emphasizes that the final transaction must be consistent as long as the message is sent to the transaction participants

This scheme is accomplished by using message-oriented middleware, as shown below:

Affairs sponsors (producers) to send a message to the message middleware, the transaction participants receiving messages from the message middleware, the transaction between the sponsors and message middleware, the transaction participants (consumer) and message middleware is communication through the network, because of the uncertainty of the network communication can lead to a distributed service problems.

Therefore, the final consistency scheme of reliable message should solve the following problems:

1. Atomicity of local transactions and message sending The atomicity of local transactions and message sending is that the transaction initiator must send the message after the local transaction is successfully executed. Otherwise, the message is discarded. That is, the atomicity of local transactions and message sending either succeeds or fails. The atomicity of local transactions and message sending is the key problem to achieve the ultimate consistency scheme of reliable messages.

Try this first, sending a message and then manipulating the database:

Begin the transaction; //1. Send MQ //2.Copy the code

In this case, the consistency between the database operation and the sending message cannot be guaranteed, because the sending message may succeed and the database operation may fail.

You immediately think of the second option, which is to do the database operation first and then send the message:

Begin the transaction; //2. Send MQ commit transation;Copy the code

There seems to be no problem in this case, and if sending the MQ message fails, an exception is thrown, causing the database transaction to roll back. However, if there is a timeout exception, the database is rolled back, but MQ has been sent normally, which will also result in inconsistency.

2 Reliability of transaction participants receiving messages

Transaction participants must be able to receive messages from the message queue and can repeat receiving messages if they fail to receive messages.

The problem of repeated message consumption

Due to the existence of network 2, if a consuming node times out but consumes successfully, the message middleware will deliver the message repeatedly at this time, leading to the repeated consumption of the message.

To solve the problem of repeated message consumption, the idempotency of transaction participants’ methods must be realized.

5.2 Solution

5.2.1 Local message table scheme

The scheme of local message table was originally proposed by eBay. The core of this scheme is to ensure the consistency of data business operations and messages through local transactions, and then send messages to message middleware through scheduled tasks, and delete messages after confirming the success of messages sent to consumers.

The following is an example of registering and sending points:

The following is an example of registering and sending points:

In the following example, there are two microservice interactions, user service and points service. The user service is responsible for adding users, and the points service is responsible for increasing points.

The interaction process is as follows: 1. User registration user service Adds users in local transactions and adds “Integral message log”. The user table and the message table are guaranteed to be consistent through local transactions

Begin the transaction; //1. Add a user. //2.Copy the code

In this case, the local database operation is in the same transaction as the store credit message log, and the local database operation and the message log operation have atomicity

2. Scan logs of scheduled tasks

How do you ensure that messages are sent to message queues?

After the first step, the message has been written to the message log table. You can start an independent thread to scan the message log table periodically and send the message to the message middleware. After the message middleware reports that the message is successfully sent, delete the message log.

3. Consumer news

How do you ensure that consumers are consuming information?

Here can use MQ ack mechanism (confirmation), consumer to monitor the MQ, if consumers to receive news and business processing after the completion of the send ack to the MQ (confirmation), at this time for normal consumption news, MQ would no longer push messages to customers, or consumers will continue to try to send a message to consumers.

The integral service receives the “increase points” message and starts to increase points. After the increase of points is successful, it responds to the message middleware for ACK. Otherwise, the message middleware will deliver the message repeatedly.

Since messages are delivered repeatedly, the “add points” feature of the points service needs to be idempotent

5.2.2 RocketMQ transaction message scheme

RocketMQ is a distributed messaging middleware from Alibaba that was opened source in 2012 and officially became an Apache Top-level project in 2017. RocketMQ runs on all alibaba Group’s messaging products, including alibaba Cloud’s messaging products and acquired subsidiaries. RocketMQ has been featured prominently in the singles’ 11th National Conference in recent years. Apache RocketMQ versions after 4.3 officially support transaction elimination, providing convenience for distributed transaction implementations.

RocketMQ transaction message design is mainly designed to solve the atomicity problem of message sending and local transaction execution at the Producer end. RocketMQ is designed to two-way communication between the broker and the Producer end. Making the broker inherently a transactional coordinator; RocketMQ itself provides a storage mechanism for persistence of transactional messages; RocketMQ’s high availability mechanism and reliable message design ensure that transaction messages can achieve the ultimate consistency of a transaction in the event of a system exception.

After RocketMQ 4.3, a complete transaction message was implemented, which is actually an encapsulation of the local message table, moving the local message table inside MQ, solving the atomicity problem of message sending and local transaction execution on the Producer end.

The execution process is as follows: For the convenience of understanding, we also describe the whole process with the example of registering and sending points.

Producer is the MQ sender, in this case the user service, which is responsible for adding users. The MQ subscriber is the message consumer, in this case the points service, which is responsible for adding points.

1. Producer sends transaction messages

The Producer sends the transaction message to the MQ Server, which marks the state of the message as Prepared, indicating that the message cannot be consumed by the MQ subscriber. In this example, Producer sends an “increase points message” to the MQ Server.

2. The MQ Server responds to the message sending successfully. If the MQ Server receives the message sent by Producer, the MQ Server responds to the message sending successfully.

The Producer side performs the business code logic and controls transactions through the local database. In this example, the Producer performs adding users.

4. Message delivery If the Producer’s local transaction is successfully executed, the Producer will automatically send a COMMIT message to the MQServer. After receiving the COMMIT message, the MQServer will mark the status of “add points message” as consumable, and the MQ subscriber (points service) will consume the message normally. If Producer fails to execute the local transaction, a ROLLBACK message is automatically sent to the MQServer. After receiving the rollback message, the MQServer deletes the Point Increment message.

The MQ subscriber (points service) consumes the message and responds to MQ with an ACK on success, otherwise the message will be received repeatedly. In this case, the ACK automatically responds by default, that is, it automatically responds to the ACK if the program is running properly.

If the Producer fails or times out while executing a local transaction, MQ Server will continuously ask other producers in the same Producer group to obtain the execution status of the transaction. This process is called transaction backcheck. The MQ Server decides whether to deliver the message based on the result of the transaction callback.

RocketMQ has implemented the main process above. On the user side, the user needs to implement the local transaction execution and the local transaction callback method respectively, so only the execution status of the local transaction is concerned.

RoacketMQ provide RocketMQLocalTransactionListener interface:

Public interface RocketMQLocalTransactionListener {/ * * ‐ prepare messages sent successfully this method is a callback, the method is used to perform local transaction ‐ @ param MSG comes back, ‐ @param arg sends an additional parameter to send. ‐ @return returns the transaction status. COMMIT: Submit the ROLLBACK: ROLLBACK UNKNOW: callback * / RocketMQLocalTransactionState executeLocalTransaction (Message MSG, Object arg); ‐ @return return transaction status, COMMIT, ROLLBACK, ROLLBACK The callback * / RocketMQLocalTransactionState checkLocalTransaction (Message MSG); }Copy the code

Send transaction message:

Here are the APIS provided by RocketMQ for sending transaction messages:

TransactionMQProducer producer = new TransactionMQProducer("ProducerGroup"); Producer. SetNamesrvAddr (127.0.0.1: "9876"); producer.start(); / / set TransactionListener producer. SetTransactionListener (TransactionListener); / / send a transaction message SendResult SendResult = producer. SendMessageInTransaction (MSG, null);Copy the code

5.3 summary

Reliable message ultimate consistency is the consistency that ensures message passing from producer to consumer through messaging middleware.

RocketMQ as messaging middleware,

RocketMQ addresses two main functions:

Atomicity of local transactions and message sending. 2. Reliability of transaction participants receiving messages. The final consistency transaction of reliable message is suitable for the scenario with long execution cycle and low real-time requirement. With the introduction of message mechanism, synchronous transaction operation is changed to asynchronous operation based on elimination execution, avoiding the influence of synchronous blocking operation in distributed transaction, and realizing the decoupling of the two services.Copy the code

Maximum effort notification for distributed transaction solutions

6.1 What is maximum Effort notification

Maximum effort notification is also a solution to distributed transactions. Here is an example of top up:

Interactive process:

The account system invokes the recharge system interface. 2. The recharge system initiates the recharge result notification to the account system after payment processing. 4. If the account system does not receive the notification, it will actively call the interface of the recharge system to query the recharge resultCopy the code

From the above example we summarize the objectives of the maximum effort notification scheme:

Objective: The originator of the notification will make its best efforts to notify the recipient of the result of the business process through some mechanism.

Specifically include:

1. There is a certain message repetition notification mechanism. Because the receiving party may not have received the notification, there must be some mechanism to notify the message repeatedly. 2. Message proofreading mechanism. If the recipient fails to be notified despite his best efforts, or if the recipient consumes the message and then wants to consume it again, the recipient can proactively query the message information from the notifiable party to meet the demand.Copy the code

What is the difference between best effort notification and reliable message consistency

1. Different ideas of solutions

Reliable message consistency: the notifier needs to ensure that the message is sent and sent to the notifier, and the reliability of the message is guaranteed by the notifier.

Maximum effort notification. The sender tries its best to notify the recipient of the service processing result. However, the recipient may fail to receive the message

The service application scenarios are different

Reliable message consistency is concerned with transaction consistency in the transaction process, completing the transaction in an asynchronous manner.

Best effort notification is concerned with post-transaction notification transactions, i.e., notification of reliable transaction results.

3, technical solution direction different reliable message consistency to solve the message from sent to received consistency, that is, the message sent and received.

Best effort notification does not guarantee consistency of messages from sending to receiving, but only provides a reliability mechanism for message receiving. The reliable mechanism is that the best efforts are made to notify the receiver of the cancellation of the message, and when the message cannot be received by the receiver, the receiver takes the initiative to query the message (business processing results).

6.2 Solution

With an understanding of maximum effort notification, the ACK mechanism of MQ can be used to achieve maximum effort notification

Plan 1:

This scheme uses MQ ACK mechanism to send notification to the receiving party by MQ. The process is as follows:

1. The notification originator sends the notification to MQ. Notifications are sent to MQ using a plain messaging mechanism. Note: If the message is not sent, the receiving party can actively request the initiating party to query the service execution result. The receiving party listens to MQ. 3. The receiving party receives the message and responds to ACK after service processing. 4. 5. The recipient can check message consistency through message proofreading interface.Copy the code

Scheme 2:

This scheme also uses the ACK mechanism of MQ. Different from Scheme 1, the application sends the notification to the receiving party, as shown below:

The interaction process is as follows:

1. The notification originator sends the notification to MQ. The atomicity of local transactions and messages is ensured using transactional messages in the reliable message consistency scheme, which ultimately sends notifications to MQ first. Notifier listens to MQ and receives MESSAGES from MQ. In scenario 1, the notifier listens to MQ directly, and in scenario 2, the notifier listens to MQ. If the notifier does not respond to an ACK, MQ will repeat the notification. 3. The notification program invokes the receiving notification scheme interface through Internet interface protocol (such as HTTP and WebService) to complete the notification. Notification succeeds when the notifier successfully invokes the receive notification scheme interface, that is, consuming MQ messages successfully, and MQ will no longer deliver notification cancelling to the notifier. 4. The recipient can check message consistency through the message check interface.Copy the code

Differences between Plan 1 and Plan 2:

1. The interface between the notification receiver and MQ in Scheme 1, that is, the notification receiver scheme listens to MQ. This scheme mainly applies notifications between internal applications.

2. In Scheme 2, the notifier interconnects with MQ, and the notifier monitors MQ. After receiving the message from MQ, the notifier invokes the notifier through Internet interface protocol. This scheme is mainly applied to the notification between external applications, such as alipay and wechat payment results notification.

7 Comparative analysis of distributed transactions:

After looking at the various solutions for distributed transactions, we learned the pros and cons of each solution:

The biggest problem with 2PC is a blocking protocol. The RM waits for TM’s decision after executing the branch transaction, at which point the service blocks and locks the resource. Due to its high blocking mechanism and worst-time complexity, this design can not adapt to the need to expand with the increase of the number of services involved in transactions, and is difficult to be used in distributed services with high concurrency and long running transactions.

If the TCC transaction processing flow is compared with the 2PC two-phase commit, 2PC is usually handled at the DB level across libraries, while TCC is handled at the application level, which needs to be implemented by business logic. The advantage of this distributed transaction implementation is that applications can define their own granularity of data operations, making it possible to reduce lock conflicts and improve throughput. However, the disadvantage is that the application is very intrusive, and each branch of the business logic needs to implement the try, confirm, and cancel operations. In addition, it is difficult to implement, and different rollback strategies need to be implemented according to different failure causes, such as network status and system faults. Typical usage scenarios: full, login to send coupons and so on.

The final consistency transaction of reliable message is suitable for the scenario with long execution cycle and low real-time requirement. With the introduction of message mechanism, synchronous transaction operation is changed to asynchronous operation based on elimination execution, avoiding the influence of synchronous blocking operation in distributed transaction, and realizing the decoupling of the two services. Typical usage scenarios: register to send points, login to send coupons and so on.

Maximum effort notification is one of the lowest requirements in distributed transactions and is suitable for some businesses with low final consistency time sensitivity. Allow the sender to deal with the business failure and actively deal with the failure after receiving the notice. No matter how the sender deals with the failure, the subsequent processing of the receiving party will not be affected; The sender of the notification shall provide an interface for querying the execution status for receiving the notification to check the results. Typical usage scenarios: bank notification, payment result notification, etc.

Maximum effort notification is one of the lowest requirements in distributed transactions and is suitable for some businesses with low final consistency time sensitivity. Allow the sender to deal with the business failure and actively deal with the failure after receiving the notice. No matter how the sender deals with the failure, the subsequent processing of the receiving party will not be affected; The sender of the notification shall provide an interface for querying the execution status for receiving the notification to check the results. Typical usage scenarios: bank notification, payment result notification, etc.

2PC TCC Reliable sources Best effort notice
consistency Strong consistency The final agreement The final agreement The final agreement
throughput low In the high high
Implementation complexity easy difficult In the easy

Conclusion: We choose local transaction single data source whenever possible, because it reduces performance loss caused by network interaction and avoids various problems caused by weak data consistency. If a system frequently and irrationally uses distributed transactions, we should first observe whether the separation of services is reasonable from the perspective of overall design, and whether it is high cohesion and low coupling. Is the granularity too small? Distributed transactions have always been a challenge in the industry because of the uncertainty of the network and the fact that we are used to comparing distributed transactions to single-machine transactions ACID.

Neither XA at the database level, nor TCC at the application level, reliable messaging, maximum effort notification, etc., can perfectly solve the distributed transaction problem. They are just trade-offs in performance, consistency, availability, etc., and seek trade-offs under certain scenario preferences.

Daily for praise

All right, everybody, that’s all for this article. All the people here are talented.

Creation is not easy, your support and recognition, is the biggest motivation for my creation, we will see in the next article

Six pulse excalibur | article “original” if there are any errors in this blog, please give criticisms, be obliged!