This is the first day of my participation in Gwen Challenge

Local transactions are most commonly used in work, but after splitting a single project into SOA and microservices, the distributed transaction scenario is involved

This article takes distributed transaction as the main line to expand the explanation, and explains 2PC and 3PC algorithms in detail. Finally, a Demo is used to master distributed transaction more deeply. The directory structure of the article is as follows

  • What is a transaction
  • What are distributed transactions
  • DTP model and XA specification
    • What is the DTP model
    • What is the XA specification
  • 2PC consistency algorithm
    • 2PC- Preparation stage
    • 2PC- Commit phase
    • Advantages and disadvantages of 2PC algorithm
  • 3PC consistency algorithm
  • JDBC operates MySQL XA transactions
  • “Said

What is a transaction

A transaction is the smallest unit of work of a database operation, an indivisible set of operations, a series of operations performed as a single logical unit of work. These operations are submitted to the system as a whole and either all or none of them are executed

Transactions have four characteristics, Atomicity, Consistency, Isolation and Durability, referred to as ACID properties of transactions

How can ACID properties be guaranteed for transactions?

  • Atomicity: SQL within a transaction either succeeds or fails at the same time, based on undo logging

  • Consistency: The transfer of the system from one correct state to another is guaranteed by the application through AID. It can be said that Consistency is the core feature of transactions

  • Isolation: Controls the visibility of data when transactions are executed concurrently, based on locking and multi-version concurrency control (MVCC) implementations

  • Durability: They save for success and do not lose after submission, based on redo log

This article mainly introduces distributed transactions 2PC and 3PC. Redo, undo log, MVCC, and lock will be discussed later

In the early days, our application was a single project, so we worked on a single database, which we called a local transaction. Local transaction ACID is generally supported at the database level, such as the MySQL database we use in our work

When operating MySQL client, MySQL implicitly commits transactions automatically, so daily work does not involve manually writing transaction creation, commit, rollback and other operations. If you want to experiment with locking, MVCC, etc., you can create multiple sessions and use begin, COMMIT, rollback, etc., to test the data between different transactions to see if the results are the same as you want

We normally develop our project code using transactions wrapped in Spring, so we don’t manually write commits, rollbacks, and other methods to database transactions (except in a few cases). Here is a sample code using native JDBC to help you understand how to guarantee ACID’s four characteristics through transactions

Connection conn = ... ;// Get the database connection
conn.setAutoCommit(false); // Start the transaction
try {
   / /... Execute add, delete, change and check SQL
   conn.commit(); // Commit the transaction
} catch (Exception e) {
  conn.rollback(); // Transaction rollback
} finally {
   conn.close(); // Close the link
}
Copy the code

How does Spring automatically help us manage transactions if it is painful to write repeated create transaction, commit, rollback, and so on every time we perform a database operation? In Spring projects we generally use two approaches for transaction management, programmatic and declarative transactions

Use Spring to manage transactions in your project, either by annotating @Transactional on interface methods or using AOP to configure faceted transactions. Both methods are essentially the same, except that @Transactional is more granular and relies on AOP to implement them, for example

@Service
public class TransactionalService {
  @Transactional
  public void save(a) {
      // Business operation}}Copy the code

TransactionalService is put into the container by Spring creating a proxy object that corresponds to the following class

public class TransactionalServiceProxy {
  private TransactionalService transactionalService;
  public TransactionalServiceProxy(TransactionalService transactionalService) {
    this.transactionalService = transactionalService;
  }
  
  public void save(a) {
      try {
          // Start transaction operations
          transactionalService.save();
      } catch (Exception e) {
          // Roll back if an exception occurs
      }
      // Commit the transaction}}Copy the code

The sample code looks neat, but the actual code generation code comparison is much more complex. About the transaction manager, Spring provides interface PlatformTransactionManager, its internal contains two important implementation class

  • DataSourceTransactionManager: support local transactions, internal through Java SQL. The Connection to open, commit and rollback transaction

  • JtaTransactionManager: Used to support distributed transactions, which implements the JTA specification and uses the XA protocol for two-phase commit

From these two implementation classes, we know that the programmatic and declarative transactions we usually use depend on the local transaction management implementation. Spring also supports distributed transactions, and there is a lot of information about JTA distributed transaction support on the web, which will not be described here

What are distributed transactions

Local transactions in everyday business code are used all the time and are not difficult to understand. However, with the popularity of SOA and microservices, our single business system has been split into multiple systems. In order to meet the change of business system, the database has also been split with the business

For example, taking the school management system as an example, it may be split into student services, curriculum services, teacher services, etc., and the database is also split into multiple libraries. In this case, when different services are deployed to the server, it is possible to face the following service invocation

The ServiceA service needs to operate the database to perform a local transaction, and at the same time needs to invoke the ServiceB and ServiceC services to initiate a transaction call. How to ensure that the transaction of the three services will either succeed or fail together? This is undoubtedly a distributed transaction scenario where a single local transaction of three services cannot guarantee the entire requested transaction

There are many kinds of solutions for distributed transaction scenarios. According to different classifications, strong consistency solution, final consistency solution, subdivided solutions include 2PC, 3PC, TCC, reliable message…

Solutions like Alibaba’s RocketMQ transaction messaging, Seata XA pattern, and reliable messaging model are widely used in the industry. However, distributed transactions invariably operate directly or indirectly on multiple databases, and the use of distributed transactions also presents new challenges, namely performance issues. If performance degrades in order to ensure strong consistency of distributed transactions or the ultimate consistency of compensation schemes, it is not worth the loss for normal business

DTP model and XA specification

The X/Open organization defines the Distributed Transaction Model (DTP) and distributed Transaction Protocol (XA), and DTP consists of the following model elements

  • AP (Application Application): is used to define transaction boundaries (that is, to define the start and end of a transaction) and to operate on resources within transaction boundaries
  • TM (Transaction Manager): Is responsible for assigning transaction unique identifiers, monitoring the execution progress of transactions, and committing and rolling back transactions
  • RM (Resource Manager): Provides access to resources, such as databases and file systems
  • CRM (Communication Resource Manager) : Controls the Communication between distributed applications in a TM domain or across TM domains
  • Communication Protocol (CP) : provides low-level Communication services between distributed application nodes provided by CRM

In the DTP distributed transaction model, the basic components need to cover AP, TM and RMS (CRM and CP are not required), as shown in the figure below

XA specification

The most important role of the XA specification is to define the interface between RM (resource manager) and TM (transaction manager). In addition, the XA specification not only defines the interactive interface between 2pcs, but also optimizes 2PC

Figure out the relationship among DTP, XA, and 2PC

DTP defines the role model in distributed transactions and specifies that the control of global transactions requires the use of 2PC protocol to ensure data consistency

2PC, short for two-Phase Commit, is an algorithm designed to ensure atomicity and consistency of all nodes in distributed system architecture during transaction processing in computer networks, especially in the field of database. At the same time, 2PC is also considered as a consistency protocol to ensure data consistency in distributed systems

The XA specification is a distributed transaction processing specification proposed by the X/Open organization. The XA specification defines the interface needed in 2PC (Two-phase Commit Protocol), namely the interaction between RM and TM in the figure above. The DTP model defines the interface specification for communication between TM and RM as XA, while the XA specification proposed by relational databases (such as MySQL) based on X/Open (which relies on 2PC algorithms at its core) is called XA scheme

2PC consistency algorithm

When an APPLICATION (AP) initiates a transaction across multiple distributed nodes, each distributed node (RM) knows the result of its transaction, but cannot obtain the result of other distributed nodes. To ensure ACID properties of transaction processing, a component (TM) called the Coordinator needs to be introduced to uniformly schedule distributed execution logic

The coordinator schedules the behavior of the distributed nodes that participate in the overall transaction and ultimately decides whether the distributed nodes commit or roll back the transaction. Therefore, based on this idea, two distributed consistency algorithm protocols of two-phase commit and three-phase commit are derived. Phase two refers to the preparation phase and the commit phase. Now let’s look at what happens in the preparation phase

2PC- Preparation stage

The first phase of the two-phase commit is also called the “vote phase”, where participants vote on whether to proceed with the next step of the transaction commit

  • Transaction query: The coordinator sends transaction content to all participants in the distributed transaction, asks if the transaction commit can be performed, and then waits for the response from each participant

  • Execute transaction: The participant receives a transaction request from the coordinator, executes the corresponding transaction, and writes the contents to the Undo and Redo logs

  • Return response: If each participant performs a transaction, the feedback coordinator responds Yes; If each participant fails to successfully execute the transaction, the coordinator No response is returned

If all participants in the first phase return a successful response, the transaction commit step is entered, whereas the distributed transaction returns a failure. In the MySQL database, for example, in the first stage, the transaction manager (TM) to all involved in the database (RM) prepare it was prepared to submit a request, the database after receiving the request execution modification and logging data processing, processing after the completion of the state of affairs is modified to “submit”, finally return the results to the transaction processor

2PC- Commit phase

The submission stage is divided into two processes. One is that all participants normally execute the transaction submission process and return Ack response, indicating that all participants have voted successfully. If No response is returned or timeout occurs, the global rollback is triggered, indicating that the distributed transaction fails to be executed

  • Perform transaction commit

  • Interrupt the transaction

Perform transaction commit

Assuming that the coordinator receives Yes responses from all participants, the transaction commit is performed

  • Transaction Commit: The coordinator sends a Commit request to all participant nodes. After receiving the Commit request, each participant commits the local transaction and releases the transaction resources occupied during the transaction execution cycle

  • Completion transaction: After each participant completes the transaction submission, it sends an Ack response to the coordinator, who completes the distributed transaction after receiving the response

Interrupt the transaction

If any of the transaction participant nodes returns a No response to the coordinator (note that the No response here refers to the first phase), or if the coordinator does not receive a response from all participants after the wait timeout, the transaction interrupt process will occur

  • Transaction Rollback: the coordinator issues a Rollback request to all participants. After receiving the Rollback request, the participants use the undo log written in the first phase to Rollback the transaction and release the occupied resources after the Rollback transaction is completed

  • Interrupt transaction: After completing the rollback of the transaction, the participant sends an Ack message to the coordinator, who completes the interrupt of the transaction after receiving the Ack message from the transaction participant

2 PC pros and cons

2PC submission divides the transaction processing process into two stages: voting and execution. The core idea is to deal with each transaction in the way of first try and then submit. 2PC has obvious advantages, that is, simple principle and convenient implementation. Simplicity also means that many places are not perfect. Here are three core defects

  1. Synchronous blocking: Both during the first phase and during the second phase, all participant resources and coordinator resources are locked, and only when all nodes are ready will the transaction coordinator notify the global commit and the participant will release the resources after the local transaction commit. This process takes a long time and has a significant impact on performance

  2. Single point of failure: If the coordinator has a problem, the entire two-phase commit process will not work. In addition, if the coordinator fails in the second phase, the other participants will be in a state of locking transaction resources

  3. Data inconsistency: After the coordinator sends a Commit request to all participants in the second phase, a local network exception occurs or the coordinator crashes before sending the Commit request, causing only some participants to receive the Commit request. This leads to data inconsistency

Due to the simplicity and convenience of 2PC, synchronization blocking, single point of failure, data inconsistency and other situations mentioned above will occur. Therefore, it is improved on the basis of 2PC and implemented three-stage commit (3PC).

There are many limitations to using 2PC. The first is that the database needs to support XA specification, and the performance and data consistency data are not friendly. Therefore, although XA mode is supported in Seata, AT mode is mainly promoted

3PC consistency algorithm

Three-phase Commit (3PC) is an improved version of two-phase Commit (2PC), introducing two new features

  1. Both the coordinator and the participant introduce a timeout mechanism to solve the synchronization blocking problem of 2PC and prevent transaction resources from being permanently locked

  2. The second phase is changed into three phases. The first phase “preparation phase” of the second phase Commit protocol is divided into two phases, and the new three phases CanCommit, PreCommit and DO Commit are formed to form the transaction processing protocol

The detailed submission process of 3PC will not be repeated here. Compared with 2PC, the biggest advantage of 3PC is that it reduces the blocking range of participants and can continue to reach an agreement after a single point of failure of the coordinator

Although the timeout mechanism solves the problem of permanently blocking resources, 3PC still has the problem of inconsistent data. After a participant receives a PreCommit message, if the network is partitioned and the coordinator and participant cannot communicate properly, the participant will still commit the transaction

By looking at 2PC and 3PC, we can see that neither of them can completely solve data consistency under distributed conditions

JDBC operates MySQL XA transactions

MySQL has supported XA distributed transactions since 5.0.3 and only InnoDB storage engine. MySQL Connector/J has provided direct XA support since version 5.0.0

In DTP model, MySQL belongs to RM resource manager, so we will not demonstrate MySQL support XA transaction statement, because it only executes its own single transaction branch, we use JDBC to demonstrate how to use TM to control multiple RM to complete 2PC distributed transaction

The Maven version of GAV needs to be introduced first, as the higher version 8.x removes XA distributed transaction support.

<dependencies>
    <! -- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.38</version>
    </dependency>
</dependencies>
Copy the code

Here in order to ensure the comfort of reading in the public account, through IDEA will be merged into a line of code, if you need to paste into IDEA, format it

Because XA protocol is based on 2PC consistency algorithm, you can understand and simulate errors and execution results by referring to the DTP model and 2PC mentioned in the above article when looking at the code

import com.mysql.jdbc.jdbc2.optional.MysqlXAConnection;import com.mysql.jdbc.jdbc2.optional.MysqlXid;import javax.sql.XAConnection;import javax.transaction.xa.XAException;import javax.transaction.xa.XAResource;import javax.transaction.xa.Xid;import java.sql.*;

public class MysqlXAConnectionTest {
    public static void main(String[] args) throws SQLException {
        // true prints XA statements for debugging
        boolean logXaCommands = true;
        // Get the resource Manager operation interface instance RM1
        Connection conn1 = DriverManager.getConnection("jdbc:mysql://localhost:3306/test"."root"."root"); XAConnection xaConn1 =newMysqlXAConnection((com.mysql.jdbc.Connection) conn1, logXaCommands); XAResource rm1 = xaConn1.getXAResource();// Get the resource Manager operation interface instance RM2
        Connection conn2 = DriverManager.getConnection("jdbc:mysql://localhost:3306/test"."root"."root"); XAConnection xaConn2 =newMysqlXAConnection((com.mysql.jdbc.Connection) conn2, logXaCommands); XAResource rm2 = xaConn2.getXAResource();// THE AP (application) requests TM (transaction manager) to execute a distributed transaction and TM generates the global transaction ID
        byte[] gtrid = "distributed_transaction_id_1".getBytes();int formatId = 1;
        try {
            / / = = = = = = = = = = = = = = respectively executive affairs branch of RM1 and RM2 = = = = = = = = = = = = = = = = = = = =
            // TM generates the transaction branch ID on RM1
            byte[] bqual1 = "transaction_001".getBytes(); Xid xid1 =new MysqlXid(gtrid, bqual1, formatId);
            // Execute transaction branch on RM1rm1.start(xid1, XAResource.TMNOFLAGS); PreparedStatement ps1 = conn1.prepareStatement("INSERT into user(name) VALUES ('jack')"); ps1.execute(); rm1.end(xid1, XAResource.TMSUCCESS);// TM generates the transaction branch ID on RM2
            byte[] bqual2 = "transaction_002".getBytes(); Xid xid2 =new MysqlXid(gtrid, bqual2, formatId);
            // Execute transaction branch on RM2rm2.start(xid2, XAResource.TMNOFLAGS); PreparedStatement ps2 = conn2.prepareStatement("INSERT into user(name) VALUES ('rose')"); ps2.execute(); rm2.end(xid2, XAResource.TMSUCCESS);/ / = = = = = = = = = = = = = = = = = = = two-phase commit = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
            // Phase1: Asks all RMS to prepare the commit transaction branch
            int rm1_prepare = rm1.prepare(xid1);int rm2_prepare = rm2.prepare(xid2);
            // Phase2: Commit all transaction branches
            if (rm1_prepare == XAResource.XA_OK && rm2_prepare == XAResource.XA_OK) {
                // All transaction branches were prepared successfully, and all transaction branches were committed
                rm1.commit(xid1, false); rm2.commit(xid2,false);
            } else {
                // If a transaction branch fails, it is rolled back
                rm1.rollback(xid1);rm1.rollback(xid2);
            }
        } catch(XAException e) { e.printStackTrace(); }}}Copy the code

“Said

This paper explains how to guarantee the four characteristics of local transactions, the output background of distributed transactions, and why 2PC and 3PC can not solve the data consistency under the distributed situation, and finally demonstrates the execution process of 2PC through JDBC. I believe that you also have a deep impression of distributed transactions after reading, and have a clear understanding of DTP, XA, 2PC, which are easy to confuse the concept

This is the beginning of the first chapter of “Distributed Transactions” column, followed by the completion of distributed transactions through message-oriented middleware, reliable message model, Seata XA model, and summarized the advantages and disadvantages of different implementation methods, select the appropriate scenario to use different distributed transaction solutions

The author believes that the best way to learn is actual combat. If you have no contact with distributed transactions, you can simulate the business scene of distributed transactions through the project you are writing, so as to deepen your impression and better understand the design ideas related to distributed transaction solutions

Creation is not easy, the article see here if it helps, you can point to a concern to support, WISH well. We’ll see you next time

Standing on the shoulders of giants

  • Principles and Practices of Distributed Consistency from Paxos to Zookeeper

  • Tamorizhi Java Technology Blog


Recommend 👍 : online problem checking: the JVM optimization back at the bottom of the pot | learn over the weekend

Recommend 👍 : five minutes literacy: the story of the evolution process of the DataSource | learn over the weekend

Recommended 👍 : a factory interview: How to use SPI mechanism gracefully | weekend learning