Finally, someone has clarified distributed transactions!

preface

This article will give you some insights into distributed transactions, and explain the implementation principle of the distributed transaction processing framework TX-LCN.

1. When are distributed transactions needed?

There are many scenarios, but one is common: In a microservice system, if a business needs to use different microservices, and different microservices correspond to different databases.

For example, the e-commerce platform has a business logic for customers to place orders. This business logic involves two micro-services, one is inventory service (inventory minus one) and the other is order service (order number plus one). The schematic diagram is as follows:

If this business logic is executed without a distributed transaction, when either the inventory or the order fails, it is likely that the inventory database value decreases by 1, but the order database does not change; Or there is no change in inventory, one more order, that is, there is data inconsistency.

So we use distributed transactions in similar situations to ensure data consistency.

2. Solution idea of distributed transaction

2.1 Introduction: MySQL two-phase commit policy

Before we talk about distributed transaction solutions, let’s look at how a single data source does transactions to get some ideas.

MySQL InnoDB engine as an example, because MySQL has two log mechanisms, one is the redo log of the storage layer, the other is the binlog of the server layer, every time the data is updated, two logs are updated. To avoid logging one without the other, MySQL uses a method called two-phase commit to ensure transaction consistency. The details are as follows:

Mysql > create table T(ID int primary key, c int); Mysql > update T set c=c+1 where ID=2;

The execution flow of the update statement looks like this:

The first thing the executor will do is go to the engine and fetch the line ID=2
After receiving the data, the +1 operation is performed on the data and the engine interface is called to write the new data
The engine updates data to memory and logs the redo log. The redo log is in the prepare state. But it does not commit the transaction, just notifies the executor that the task has been completed and can be committed at any time.
The executor generates a binlog of this operation and writes the binlog to disk
Finally, the executor calls the transaction interface of the engine and changes the redo log to the commit state.

In the preceding procedure, the redo log was not submitted directly after it was written. Instead, the redo log was in the prepare state and was submitted after the executor was notified that the binlog was finished. This process is called two-phase commit, and it’s a neat design.

You might ask why do you have two-phase commit? If the system fails after the redo log is written, only the redo log records the operation, and the binlog does not record the operation, causing data inconsistency. With two-phase commit, if the system fails after the redo log is written, the transaction has not been committed and can be rolled back smoothly.

What else are the benefits of a two-phase submitted design? The first step is to establish the concept that the longer an operation takes to execute, the more likely it is to fail. For example, if it takes you 20 minutes to eat and 1 minute to go to the bathroom, you are more likely to receive wechat messages while eating than when you go to the bathroom. Since the update operation takes much longer than the transaction submission time in the database, the update operation should be completed first and the transaction can be submitted after all time-consuming operations are completed to ensure the success of the transaction.

2.2 Two-phase commit strategy for distributed transactions

Distributed transactions can take a similar approach to complete transactions based on the two-phase commit strategy described above.

In the first phase, we will add a transaction manager role to coordinate various data sources. Or take the order case at the beginning to explain, in the execution of the order logic, let each database to perform their own transactions, such as reducing 1 from the inventory, adding 1 to the order base, but after completion, do not submit, just notify the transaction manager has completed the task.

In the second stage, since we have received the information of whether all data sources are ready or not in the first stage, as long as one data source is not ready, all data sources will be notified to roll back in the second stage. If all data sources are ready, they are informed to commit the transaction.

To summarize the two-phase commit process, the transaction manager first notifies each data source of the operation and returns information about whether it is ready. Once all data sources are ready, a transaction commit (rollback) notification is sent to all data sources to commit the transaction. Because the final commit operation is very short, the probability of failure is very low.

So what are the possible drawbacks of this two-phase commit protocol? It is likely that there is a blocking problem, and if one of the data sources has some problem blocking and can return neither success nor failure, the entire transaction will be blocked. The strategy is to add some countdown or resend the message.

3. Distributed transaction framework TX-LCN

Speaking of so much theoretical knowledge, the following is a real application in the production of distributed transaction framework TX-LCN operation principle. (TX-LCN is not the only typical distributed transaction framework, such as Alibaba’s GTS, but GTS is charged, TX-LCN is open source)

Let’s take a look at a schematic of how it works in the official documentation:

The idea is similar to the two-stage distributed transaction processing process we discussed above (with minor differences). The core steps are divided into 3 steps:

Create a transaction group: call TxManager to create a transaction group object before the transaction initiator starts executing the business code, and then get the process of the transaction representing the GroupId. Simply put, for this order, create an object in the transaction manager and get an ID.
Join a transaction group: After executing a business method, the participant informs TxManager of the transaction information for the module. After each data source (each service) completes the operation, say to the transaction manager and register yourself.
Notify transaction group: after the initiator executes the business code, it will notify TxManager of the execution result status of the initiator. TxManager will inform corresponding participating modules to submit or roll back the transaction according to the information of the transaction final status and transaction group, and return the result to the transaction initiator. The order placing service dealing with the customer receives a successful destocking and order filling message, which it notifies the transaction manager, who notifies the two inventory services to commit or roll back the transaction depending on the situation.

At present there is a good tX-LCN implementation source code analysis article

In the article, follow the source code to walk through it will find similar to the above flow chart. There are some wonderful places to implement the code, such as:

public Object runTransaction(DTXInfo dtxInfo, BusinessCallback business) throws Throwable {
        if (Objects.isNull(DTXLocalContext.cur())) {
            DTXLocalContext.getOrNew();
        } else {
            return business.call();
        }
        log.debug("<---- TxLcn start ---->"); DTXLocalContext dtxLocalContext = DTXLocalContext.getOrNew(); TxContext txContext; / / -- -- -- -- -- -- -- -- -- -- to make sure every module under a DTX will only have a TxContext -- -- -- -- -- -- -- -- -- -- / /if(globalContext. HasTxContext ()) {/ / a transaction context for the parent context txContext = globalContext. TxContext (); dtxLocalContext.setInGroup(true);
            log.debug("Unit[{}] used parent's TxContext[{}].", dtxInfo.getUnitId(), txContext.getGroupId());
        } elseTxContext = GlobalContext.starttx (); } / /... }Copy the code

This code guarantees that there will only be one TxContext per module. In other words, if a business logic does not operate on different data sources, but performs the same operation on the same data source many times, then the corresponding module of the data source will have only one TxContext under the DTX

Transaction coordination mechanism of LCN

The slogan of LCN is: LCN does not produce transactions, LCN is only the coordinator of local transactions. TxManager: TxManager: TxManager: TxManager: TxManager: TxManager: TxManager: TxManager: TxManager: TxManager

Because each module is a TxClient, there is a Connection pool under each TxClient, which is the Connection pool defined by the framework, and the Connection is wrapped in the way of static proxy.

public class LcnConnectionProxy implements Connection {
    private Connection connection;
    public LcnConnectionProxy(Connection connection) {
        this.connection = connection;
    }
    /**
     * notify connection
     *
     * @param state transactionState
     * @return RpcResponseState RpcResponseState
     */
    public RpcResponseState notify(int state) {
        try {
            if (state == 1) {
                log.debug("commit transaction type[lcn] proxy connection:{}.", this);
                connection.commit();
            } else {
                log.debug("rollback transaction type[lcn] proxy connection:{}.", this);
                connection.rollback();
            }
            connection.close();
            log.debug("transaction type[lcn] proxy connection:{} closed.", this);
            return RpcResponseState.success;
        } catch (Exception e) {
            log.error(e.getLocalizedMessage(), e);
            return RpcResponseState.fail;
        }
    }
    @Override
    public void setAutoCommit(boolean autoCommit) throws SQLException {
        connection.setAutoCommit(false); } / /... }Copy the code

The connection pool holds connection resources for this distributed transaction until the notification transaction is received. When TxClient is notified by TxManager, the commit or rollback will be performed. Therefore, the transaction coordination mechanism of LCN is equivalent to intercepting the connection pool and controlling the transaction commit of the connection.

Transaction compensation mechanism for LCN

Since we cannot guarantee that the transaction will execute properly every time, if the transaction is not committed due to a server hang up or network jitter during the execution of a business method that should have been successfully executed, such a scenario requires compensation to complete the transaction.

In this case TxManager will make a flag; It is then returned to the initiator. Tell him that there is no notification of the transaction, and then TxClient executes the request transaction again.

The last

Welcome to pay attention to my public number [programmer chasing wind], the article will be updated in it, sorting out the data will be placed in it.