Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

With the development of microservices, more and more scenarios need to implement distributed transactions. Distributed transaction can be divided into compensation – based scheme and message – based notification scheme.

Compensation based schemes include 2PC, TCC, Saga, and Seata AT, all of which can be regarded as XA compliant or variants of XA. We will only talk about 2PC and TCC this time. We will talk about other modes when we have time in the future.

Distributed transaction definition

Distributed transactions are used to ensure data consistency between different nodes in a distributed system.

Distributed transaction is relative to single machine transaction/local transaction. In distributed scenario, a system consists of multiple subsystems, each subsystem has independent data source. In the microservice system architecture, we regard each subsystem as a microservice, each microservice can maintain its own data store, keep independent, and finally combine more complex business logic through their mutual calls.

Take a simple example, in the e-commerce system, there will be inventory service, order service, and shopping service at the business level. When an order is generated, the inventory service is called to deduct the inventory and the order service is called to insert the order record. We need to keep both the inventory service and the order service transactional. This is the problem that distributed transactions solve.

The XA protocol


Before I can talk about distributed transactions, it is necessary to understand the XA protocol. XA is a protocol. It is a set of standard interfaces for distributed transactions developed by X/Open organization. Implementing these interfaces means supporting XA protocol.

The XA protocol has two important contributions, one is to define two roles, and the other is to define related interfaces (only definitions but no implementation).

role

The XA protocol defines two roles for distributed transaction participants:

  • Transaction Coordinator (TM=Transaction Manager, corresponding to the shopping service in the example)

  • Resource Manager/transaction participant (RM=Resource Manager, corresponding to the inventory and order services in the example)

interface

The XA specification mainly defines the interface between the Transaction Manager and the Local Resource Manager.

The following functions enable the transaction manager to perform operations on the resource manager:

1) XA_OPEN, xA_CLOSE: Establishes and closes connections to the resource manager.

2) xa_start,xa_end: Starts and ends a local transaction.

3) Xa_prepare, xA_COMMIT, xA_rollback: pre-commit, commit, and rollback a local transaction.

4) XA_RECOVER: Rolls back a pre-committed transaction.

5) Functions starting with ax_ enable resource managers to dynamically register with TRANSACTION managers and operate on Xids (TRANSACTION IDS).

6) the ax_reg and ax_unreg; Allows resource managers to dynamically register or unregister within a TMS(TRANSACTION MANAGER SERVER).

Two-stage Submission (2PC)

Two-phase Commit (2PC) is an implementation of XA distributed transaction protocol. In the function defined by XA protocol, xa_prepare and XA_COMMIT can already be found that XA complete commit is divided into two stages: preparation and commit.

2PC is mostly used at the database level. However, 2PC is rarely used at the business level to solve many problems. This time, we mainly talk about 2PC on the database level. Let’s take a look at the two-phase commit process:

process

Preparation stage

The preparation phase consists of the following three steps:

  • The coordinator sends transaction content to all participants, asks if the transaction can be committed, and waits for all participants to respond.

  • Each participant performs a transaction, recording undo and redo information to the transaction log (but not committing a transaction).

  • If the participant succeeds in the execution, feedback yes to the coordinator, that is, it can be submitted; If the execution fails, feedback no to the coordinator, that is, no submission.

The commit phase

The coordinator makes the decision to Commit a Commit() or Rollback() based on the readiness of each transaction participant.

If the coordinator receives a failure message or timeout message from each participant, send a rollback message directly to each participant. Otherwise, send a commit message.

Participants perform commit or rollback operations according to the coordinator’s instructions to release lock resources used during all transactions. (Note: Lock resources must be released at the end)

The sample

The commit phase process is discussed in two separate cases.

Commit the transaction

In case 1, when all participants respond yes, the transaction is committed, as shown below:

  • The coordinator issues a formal commit transaction request (a COMMIT request) to all participants.

  • The participant performs the COMMIT request and frees the resources occupied during the entire transaction.

  • Each participant reports back to the coordinator that the ACK is complete.

  • When the coordinator receives an ACK message from all participants, the transaction commits.

Interrupt the transaction

Case 2, when any of the participants feedback no during the preparation phase, the transaction is interrupted, as shown below:

  • The coordinator issues a rollback request (that is, a rollback request) to all participants.

  • The participant uses the undo information in phase 1 to perform a rollback and release the resources occupied during the entire transaction.

  • Each participant reports the ack completion to the coordinator.

  • When the coordinator receives an ACK message from all participants, the transaction is interrupted.

Pseudo code

The database

XA transaction: Start an XA transaction and set it to Active state. Transactions in Active state can execute SQL statements and set the XA transaction to IDLE state by using END method. In the IDLE state, you can perform the PREPARE or COMMIT operation. The one-phase operations, which are the first PHASE of the two-phase Commit, can be committed or RollBack for XA transactions in PREPARED, which is the second PHASE of the two-phase Commit.

“Scenario:” Simulation of cash + red envelope combination payment, suppose we buy 100 yuan of things, 90 yuan with cash payment, 10 red envelope payment, cash and red envelope in different libraries.

There are now two libraries: Xa_Account (account, cash) and xa_red_account(red envelope). Both libraries have an account table with simple fields such as ID, user_id, and balance_amount.

public class XaDemo {
    public static void main(String[] args) throws Exception{
        
        // Whether to enable logging
        boolean logXaCommands = true;

        // Get the rm of the account library (something the AP does)
        Connection accountConn = DriverManager.getConnection("JDBC: mysql: / / 106.12.12. XXXX: 3306 / xa_account? useUnicode=true&characterEncoding=utf8"."root"."xxxxx");
        XAConnection accConn = new MysqlXAConnection((JdbcConnection) accountConn, logXaCommands);
        XAResource accountRm = accConn.getXAResource();
        // Get the RM of the red envelope library
        Connection redConn = DriverManager.getConnection("JDBC: mysql: / / 106.12.12. XXXX: 3306 / xa_red_account? useUnicode=true&characterEncoding=utf8"."root"."xxxxxx");
        XAConnection Conn2 = new MysqlXAConnection((JdbcConnection) redConn, logXaCommands);
        XAResource redRm = Conn2.getXAResource();
  // The XA transaction has started
        // Global transaction
        byte[] globalId = UUID.randomUUID().toString().getBytes();
        // Just one identifier
        int formatId = 1;
  
        // Branch transaction of account
        byte[] accBqual = UUID.randomUUID().toString().getBytes();;
        Xid xid = new MysqlXid(globalId, accBqual, formatId);

        // Hongbao branch transaction
        byte[] redBqual = UUID.randomUUID().toString().getBytes();;
        Xid xid1 = new MysqlXid(globalId, redBqual, formatId);
        try {
            // The account transaction starts when the status is ACTIVE
            accountRm.start(xid, XAResource.TMNOFLAGS);
            // Simulate business
            String sql = "update account set balance_amount=balance_amount-90 where user_id=1";
            PreparedStatement ps1 = accountConn.prepareStatement(sql);
            ps1.execute();
            accountRm.end(xid, XAResource.TMSUCCESS);
    // The transaction status of account XA is IDLE
            // The red packet branch transaction starts
            redRm.start(xid1, XAResource.TMNOFLAGS);
            // Simulate business
            String sql1 = "update account set balance_amount=balance_amount-10 where user_id=1";
            PreparedStatement ps2 = redConn.prepareStatement(sql1);
            ps2.execute();
            redRm.end(xid1, XAResource.TMSUCCESS);


            // Stage 1: Prepare the submission
            int rm1_prepare = accountRm.prepare(xid);
            int rm2_prepare = redRm.prepare(xid1);
   
   // XA transaction status: PREPARED
            // Phase 2: TM decides whether to commit or rollback based on phase 1
            boolean onePhase = false; //TM determines that there are two transaction branches, so it cannot be optimized for one-phase commit
            if (rm1_prepare == XAResource.XA_OK && rm2_prepare == XAResource.XA_OK) {
                accountRm.commit(xid, onePhase);
                redRm.commit(xid1, onePhase);
            } else{ accountRm.rollback(xid); redRm.rollback(xid1); }}catch (Exception e) {
            // An exception occursaccountRm.rollback(xid); redRm.rollback(xid1); e.printStackTrace(); }}}Copy the code

business

It is generally divided into two roles of coordinator and several transaction implementers:

  1. First, the coordinator writes the Prepare() message to the local log and then sends the Prepare() message to all transaction implementers.

  2. After receiving the Prepare() message, the transaction executor returns Yes on success and No on failure, and writes the message to the log before returning.

  3. After the coordinator has collected the return messages from all the transaction implementers (or after a timeout period), if all the implementers return Yes, the transaction is successful and it is sent to Commit() for all implementers, otherwise Rollback() is considered a failure.

  4. The coordinator should still write the message to the log before sending it.

  5. After receiving a Commit() or Rollback() from the coordinator, the implementer writes the message to the log and then commits or rolls back based on the message.

Note: The coordinator or transaction executor writes messages sent or received to the log first, mainly for recovery after a failure. For example, after a transaction is recovered from a fault, the implementer checks the local log, commits it if it has received Commit(), and rolls back it if it has received Rollback(). If Yes, ask the coordinator again to determine what to do next. If there is nothing in the log, it is likely that the transaction executor will crash during the Prepare phase and will need to be rolled back.

The drawback of two-phase commit is that if the transaction coordinator crashes, all performers may need to wait for the coordinator, resulting in blocking.

In this way of thinking, the only theoretical case where a two-phase commit could go wrong is when the coordinator issues commit instructions and crashes with a permanent error such as a disk failure, making the transaction untraceable and unrecoverable.

Abnormal situation

The above processes are all ideal states, but networks are often not so ideal, resulting in many intermediate states. Let’s look at several exceptions:

  1. During the preparation phase, the transaction coordinator failed after Prepare was sent
  • None: The participant is locked for a long time
  1. During the submission phase, some participants did not receive information due to timeout or network problems
  • No solution: don’t know what state the participant is in
  1. In the commit phase, the transaction coordinator fails. The failure time is before, during, or after confirmation is sent
  • No solution: The participant is locked if the fault occurs before sending

  • No solution: the fault is being sent and the previous decision result is not known

  • No solution: send after failure, do not know the complete end of no

Pure 2PC solution, for many exceptions, cannot handle. To solve these problems, new features need to be added, so it’s no longer 2PC.

2 PC summary

Will submit is divided into two phases of purpose is very clear, is that as late as possible to commit the transaction, the transaction can be done before the submission as much as possible to complete all work, so that the final submission stage will be a small operation takes a very short, the operation in a distributed system failure probability is very small, the so-called “network communication crisis” is very short, This is the key to ensuring atomicity of distributed transactions with two-phase commits.

2PC scheme is simple to implement, but too thin, so it is seldom used in actual projects. The reasons are summarized as follows:

  • Performance: Phase 1 requires all participants to return to the status before phase 2 can proceed, and relevant global resources must be locked. This synchronous blocking operation will affect the concurrency of the overall transaction.

  • Protocol: 2PC requires RM to implement XA protocol. To be precise, XA is a specification, which only defines a series of interfaces, but most XA implementations are database or MQ at present. In microservice architecture, RM may be any type, either a microservice or a KV

  • Reliability: If the coordinator has a single point of failure, the participant will remain locked if the coordinator fails.

  • Inconsistent data: During the two-phase commit phase, after the coordinator sends a COMMIT request to the participant, a local network exception occurs or the coordinator fails during the commit request, resulting in only a subset of the participant receiving the commit request. This group of participants will perform a COMMIT after receiving a COMMIT request, but other machines that have not received a COMMIT request cannot perform a transaction commit. So the whole distributed system will appear data inconsistency phenomenon.

TCC

The concept of TRy-confirm-cancel (TCC) It was first proposed by Pat Helland in a paper titled Life Beyond Distributed Transactions: An Apostate’s Opinion published in 2007.

TCC is essentially a 2PC at the business level, which requires services to implement three Try(), Confirm() and Cancel() interfaces when using TCC mode. When talking about 2PC, we said that 2PC could not solve the problem of downtime, so how does TCC solve the problem of 2PC could not handle the problem of downtime? The answer is retry.

TCC is a two-stage programming model of servitization, and its three methods Try, Confirm and Cancel are all realized by business coding:

  • As a phase, the Try operation checks and reserves resources.

  • The Confirm operation performs the real business as a two-phase commit operation.

  • Cancel: cancels the reserved resource.

process

We analyze the above e-commerce orders as an example.

Try stage

The Try operation is only a preliminary operation, which, together with the Confirm, really constitutes a complete business logic. This stage mainly completes:

  • Complete all business checks (consistency).

  • Reserve required service resources (quasi-isolation).

  • Try Attempts to execute the service.

Assume that the product inventory is 100 and the purchase quantity is 2. When checking and updating the inventory, freeze the inventory of the user’s purchase quantity and create an order with the order status as to be confirmed.

To Confirm/Cancel stage

Perform Confirm or Cancel based on whether all services in the Try phase are running properly.

The Confirm and Cancel operations are idempotent. If the Confirm or Cancel operations fail, they are retried until the execution is complete.

Confirm: Confirm the service logic when all services in the Try phase are running properly

The resources used must be reserved for the Try phase. In TCC transaction mechanism, if resources can be reserved normally in the Try phase, Confirm must be complete and correct to commit.

The Confirm phase is a supplement to the Try phase. The Try and Confirm phase together form a complete service logic.

Cancel: Enters the Cancel phase when service execution fails in the Try phase

Cancel Cancels execution to release business resources reserved for the Try phase. In the above example, Cancel frees frozen inventory and updates the order status to Cancel.

Design points

Empty the rollback

If the coordinator’s Try() request fails because of a network timeout, the coordinator sends the Cancel() request at phase two, when the transaction participant actually received the Cancel() request without having performed the Try() operation before.

To address this problem, THE TCC mode requires that Cancel() return success directly in this case, allowing “empty rollback.”

The suspension

Following problem 1 above, the Try() request timed out, the transaction participant received the Cancel() request and performed an empty rollback, but just after that the network returned to normal, the transaction participant received the Try() request again, so the Try() and Cancel() were suspended, So Cancel() and then Try()

To address this, the TCC pattern requires that in this case the transaction participant record the transaction ID of Cancel(), and if the transaction ID of Try() is found to have been rolled back, the request is simply ignored.

idempotence

The implementations of Confirm() and Cancel() must be idempotent. When both operations fail, the coordinator initiates a retry.

Pseudo code

  1. Initialization: Registers a new transaction with the transaction manager to generate a global transaction unique ID

  2. Try phase execution: The execution of the try-related code, during which the corresponding call record is registered, the result of the try execution is sent to the transaction manager, and the transaction manager performs confirm or Cancel steps upon successful execution

  3. Confirm phase: The transaction manager receives a message indicating that the try execution is successful and enters the transaction Confirm phase based on the transaction ID. If the confirm fails, the transaction manager enters the cancel phase

  4. Cancel phase: After receiving a try execution failure or confirm execution failure, the transaction manager enters the cancel phase based on the transaction ID. If the execution fails, the transaction manager generates a log or an alarm for manual processing or records the failure, and the system continuously retries cancel

TCC summary

TCC and 2PC look similar. The biggest difference between TCC and 2PC is that 2PC is database-oriented while TCC is pure business.

Compared with traditional transaction mechanism (X/Open XA), TCC transaction mechanism has the following advantages compared with XA transaction mechanism introduced above:

  • Performance improvement: The granularity of resource lock control becomes smaller for specific services and the entire resource is not locked.

  • Final data consistency: Based on the idempotency of Confirm and Cancel, the transaction is confirmed or cancelled to ensure data consistency.

  • Reliability: the single point of failure of XA protocol coordinator is solved, and the whole business activity is initiated and controlled by the main business party. The business activity manager is also changed into multi-point and introduced into cluster.

  • Support: This mode can be used widely with or without local transaction control.

Disadvantages: The Try, Confirm, and Cancel operation functions of TCC must be implemented according to specific services. The service coupling degree is high, which increases the development cost.

conclusion

When looking at distributed transactions, I looked at the company’s order code and found that 2PC or TCC was not used at all, and the scheme was also rather rough: forward operation was performed, and if there was failure, reverse operation would be called, and reverse failure would be retried for several times. If there was still failure, it would be recorded and retried by another system.

The reason for this design is that the system was designed earlier and there was no such complex scene at that time. Once the system is complex, it is difficult to modify the core functions. While the system works fine for now, it will eventually need to be optimized.

I like TCC when it comes to redesign, because each system is designed according to the specification and implements the above design points, adding intermediate states and retry systems makes it much easier to modify and expand. See if there is time to implement a version of TCC solution, otherwise it is easy to understand a written waste. But first we need to finish talking about Saga, Seata AT, and notification based solutions.

data

  1. Distributed transaction

  2. Mp.weixin.qq.com/s/0eKX26pAb…

  3. github.com/TIGERB

  4. Distributed Transactions, XA Protocol

  5. xa

  6. Two-phase commit for distributed transactions

  7. Understanding of distributed transactions and two-phase commit and three-phase commit

  8. How to understand two-phase commit?

  9. Two-stage submission of engineering practice

  10. Distributed transactions and two-phase commit and three-phase commit

  11. The difference between TCC and two-phase distributed transaction processing

  12. About distributed transactions: Two-phase Commit, TCC and TX-LCN frameworks

  13. Distributed transaction solution – SEATA implements 2PC distributed transaction control

  14. Bo class climbing

  15. What are the implementation schemes of distributed strong consistency? Is 2PC strong consistency?

  16. MySQL two-phase commit concrete implementation

  17. Distributed Transaction Implementation Based on Two-Phase Commit (UP-2PC)

  18. Java distributed transaction two-phase submission coding implementation -TCC

  19. Go distributed transaction investigation

  20. RocketMQ transaction sharing

  21. TCC Demo code implementation

The last

If you like my article, you can follow my public account (Programmer Malatang)

My personal blog is shidawuhen.github. IO /

Review of previous articles:

  1. Design patterns

  2. recruitment

  3. thinking

  4. storage

  5. The algorithm series

  6. Reading notes

  7. Small tools

  8. architecture

  9. network

  10. The Go