Understanding the Nature of Transactions from Single machine to Distributed System (1)

Cabbage Java self study room covers core knowledge

Transaction processing is involved in almost every information system, and its existence is to ensure that all data in the system are in line with expectations, and there will be no contradiction between correlated data, namely Consistency of data state.

According to the classical theory of database, to achieve this goal, three aspects need to work together to ensure.

Atomic: A transaction guarantees that changes to more than one piece of data will either succeed or be undone at the same time in the same business process.
Isolation: In different business processes, transactions ensure that data being read and written by services are independent of each other and will not affect each other.
Durability: Transactions should ensure that all data modifications that are successfully submitted are persisted correctly without loss of data.

Although the concept of transactions originated in database systems, today it has been extended beyond the database itself, and can be used in all applications that require data consistency, including but not limited to databases, transactional memory, caches, message queues, distributed storage, and so on. I will use the term “data source” to refer generally to the logical devices that provide and store data in all of these scenarios, but the meanings of transactions and consistency in these scenarios may not be identical, as described below.

These four properties are the “ACID” properties of A transaction, but I don’t really agree with this term because these four properties are not orthogonal. A, I, D are the means, C is the end, the former is the cause, the latter is the effect.

When A service uses only one data source, it is classic and relatively easy to achieve consistency through A, I, and D. At this point, the data read and write of multiple concurrent transactions can be perceived by the data source whether there is a conflict, and the final order of read and write of concurrent transactions on the time line is determined by the data source. Such inter-transaction consistency is called “internal consistency”.
When a service uses multiple different data sources, or when multiple different services involve multiple different data sources at the same time, the problem becomes much more difficult. At this point, the order of concurrent or even successively executed transactions on the timeline is not determined by any data source. Such consistency between transactions involving multiple data sources is called “external consistency”.

External consistency problems are often difficult to solve using A, I, and D because of the high or unrealistic cost involved; But the external consistency in the distributed system is bound to encounter and must solve the problem, therefore we must change concept, the consistency from “yes or no” dual attribute into can discuss separately according to different intensity of multiple attributes, in ensuring that the price can get strength as high as possible under the premise of the consistency of the guarantee, is also for this reason, Transaction processing has risen from a “programming problem” of specific operations to an “architectural problem” requiring global trade-offs.

In the process of exploring these solutions, many new ideas and concepts are generated, some of which are not so intuitive. We will explain how to deal with these concepts in different transaction schemes through the same scenario and examples.

Example scenario

An online bookstore. Every time a book is successfully sold, there are three things that need to be done correctly:

The user’s account deducts the corresponding commodity payment.

Inventory is deducted from goods warehouse and goods are marked as awaiting distribution.

Merchant’s account adds corresponding commodity payment.

1. Local transactions (single service using single data source)

Local transactions should have been translated as “Local Transaction” to correspond to the later term “global Transaction”, but now that the term “Local Transaction” seems to have become mainstream, there is no need to worry about the name here. Local transactions are transactions that operate on a single transactional resource and do not require coordination by a global transaction manager.

Local transactions are the most basic transaction solution and are only suitable for scenarios where a single service uses a single data source. From the application point of view, it is directly dependent on the data source itself provides transaction ability to work, at the code level, most can only do the transaction interface a standardized packing layer (such as JDBC interface), and not deeply involved in the process of transaction, the transaction open, termination, commit, rollback, nested, set the isolation level, Even the transaction propagation mode close to the application code, all rely on the support of the underlying data source to work, which is quite different from XA, TCC, SAGA and other transactions mainly implemented by the application code introduced later.

For example, if your code calls the Transaction::rollback() method in JDBC, the successful execution of the method does not necessarily mean that the Transaction has been successfully rolled back. If the data table engine is MyISAM, the rollback() method is a meaningless and empty operation. Therefore, if we want to discuss local transactions in depth, we have to go beyond the application code level to understand some of the transaction implementation principles of the database itself, and understand how traditional database management systems implement transactions using ACID.

1.1. Implement atomicity and persistence

Atomicity and persistence are two closely related attributes in a transaction. Atomicity ensures that multiple operations of a transaction either take effect or do not take effect, and there is no intermediate state. Persistence ensures that once a transaction is in effect, its modifications cannot be undone or lost for any reason.

As we all know, data must be successfully written into persistent storage such as disks and tapes to have persistence. Data stored only in memory will be lost once the application program crashes suddenly, or the crash of database and operating system, or even the machine breaks down suddenly. All of these accidents are called “crashes”. The biggest difficulty in achieving atomicity and persistence is that the operation “write to disk” is not atomic. There are not only “write” and “unwrite” states, but also objectively exist an intermediate state of “write”. Because it is impossible to eliminate write intermediate states and crashes, writing data in memory to disk does not guarantee atomicity and persistence without additional safeguards. Here are specific examples to illustrate.

According to the preset scenario example, buying a book from a bookstore requires modifying three data: subtracting the payment from the user account, increasing the payment from the merchant account, and marking a book as shipping status in the merchandise warehouse. Because of the intermediate state of the write, the following situations can occur.

Uncommitted transaction, crash after write: Didn’t modify the three data application, but the database will have one or two changes in the data to disk, collapse at this time, once after the restart, the database must have a way to know that before the crash happened in an incomplete shopping operation, will have been modified to recover data from disk into without modification, to ensure atomicity.
Committed transactions, written before the crash: the program has been modified finished three data, but has not been all three database data changes are written to disk, collapse at this time, once after the restart, the database must have a way to know that before the crash happened a full shopping operation, will hardly had written to disk that part of the data to write, to ensure the durability.

Since the intermediate state of writing and Crash are unavoidable, in order to ensure atomicity and persistence, Recovery measures can only be taken after the Crash. Such data Recovery operation is called “Crash Recovery”. Data is also referred to as Failure Recovery or Transaction Recovery.

Commit Logging

In order to successfully complete the crash recovery, can’t write data in the disk like program to modify memory variable values, direct change a certain table row a column of a value, but will have to modify the data needed for the operation of all information, including changing what data, the data is physically located on which page of memory and disk blocks, from what value to any value, and so on. In the form of a log — that is, only sequential appending file writes (which are the most efficient writes) are recorded to disk first. Only after all the log records are safely dropped and the database sees a “Commit Record” in the log that represents a successful transaction, will the real data be modified based on the information in the log. After the modification is complete, Adding an “End Record” to the log to indicate that the transaction has been persisted is called “Commit Logging”.

It is not hard to understand how Commit Logging ensures data persistence and atomicity: First of all, once the log is successfully written to the Commit Record, the entire transaction is successful, even if the actual data modification crash, restart according to the log information has been written to the disk, continue to modify the data, which ensures persistence. Second, if the log crashes without successfully writing a Commit Record, the entire transaction will fail. After the system restarts, some logs without a Commit Record will be seen, which will be marked as rollback, and the entire transaction will look like it never happened, which ensures atomicity.

The principle of Commit Logging is clear, and there are some databases that use Commit Logging directly to implement transactions, such as Ali’s OceanBase. However, Commit Logging suffers from a huge congenital defect: all real changes to the data must occur after the transaction commits, that is, after the log is written to the Commit Record. Before that, if the disk I/O have enough spare time, if a transaction to modify the amount of data is very large, takes up a lot of memory buffer, no matter what reason, are not allowed to modify the data on the disk before the transaction Commit, this is a Commit Logging was set up, the premise of is very unfavorable to improve the performance of the database.

ARIES theory

To solve this problem, the ARIES theory finally comes into play. ARIES (Algorithms for Recovery and Isolation Exploiting Semantics, ARIES) proposed a Logging improvement scheme of “write-ahead Logging”, “Write-ahead” means allowing changes to be written Ahead of time before a transaction commits.

ARIES is the basic theory of modern database, if not all databases implement ARIES, at least can be called modern mainstream relational database (Oracle, MS SQLServer, MySQL/InnoDB, IBM DB2, PostgreSQL, And so on) are deeply influenced by this theory in transaction implementation.

Write-ahead Logging divides the time at which variable data is written into FORCE and STEAL cases, bounded by the transaction commit point.

FORCE: After a transaction is submitted, it is called FORCE if changing data must be written at the same time. If changing data cannot be written at the same time, it is called no-force. In reality, most databases use the no-force policy. As long as there are logs, changing data can be persisted at any time. In order to optimize disk I/O performance, there is NO need to FORCE data to be written immediately.
STEAL: Before the transaction commits, allowing the variable data to be written in advance is called STEAL, and not allowing it is called no-steal. In order to optimize disk I/O performance, allowing data to be written in advance helps to utilize idle I/O resources and save memory in the database cache.

Undo Log and Redo Log

Commit Logging allows no-force, but not STEAL. If a portion of the changed data is written to disk before the transaction commits, it will become an error if the transaction is rolled back or a crash occurs.

Write-ahead Logging allows both no-force and STEAL. The solution is to add another Log type called Undo Log, which must be logged before changing data is written to disk. Indicate where data was modified, from what value to what value, and so on. In order to erase pre-written data changes based on Undo Log during transaction rollback or crash recovery.

Undo logs are now commonly translated as “rollback logs”, and the Redo logs that were used to repeat data changes during crash recovery were named Redo logs, which are generally translated as “Redo logs”.

With the addition of Undo Log, write-Ahead Logging performs the following three phases during crash recovery.

Analysis stage: This phase scans logs from the last Checkpoint to find all transactions that do not have an End Record and form a set of transactions to be recovered. This collection will consist of at least two parts: Transaction Table and Dirty Page Table.
Redo phase: This phase repeats History based on the set of transactions to be recovered generated during the analysis phase as follows: Find all logs that contain Commit Records, write the modified data to disk, add an End Record to the log, and remove the set of transactions to be recovered.
Rollback stage (Undo) : this stage deals with the set of remaining recovery transactions after the analysis and redo stage, and the remaining transactions need to be rolled back, which are called Loser. According to the information in Undo Log, the information that has been written to the disk in advance is rewritten to achieve the purpose of rolling back these Loser transactions.

Operations in both redo and rollback phases should be designed to be idempotent. In order to achieve high I/O performance, these three phases inevitably involve very cumbersome concepts and details (Redo logs, Undo Log data structures, etc.) that I will not cover in detail. Write-ahead Logging is part of the ARIES theory. ARIES as a whole has many advantages such as rigor and high performance, but these also come at the expense of high complexity.

1.2. Achieve isolation

Isolation ensures that each transaction reads and writes data independently of each other. Sniffing out isolation by definition must be closely related to concurrency, because if there is no concurrency and all transactions are serial, there is no need for any isolation, or such access is inherently isolated.

But the reality is impossible without concurrency, how to achieve serial data access under concurrency? Almost all programmers will answer: Lock sync! True, modern databases provide the following three types of locks.

Write Lock (also known as x-lock) : If a Write Lock is added to data, only the transactions that hold the Write Lock can Write data. When data holds the Write Lock, other transactions cannot Write data and cannot impose a read Lock.
Read Lock (also called Shared Lock, s-lock for short) : Multiple transactions can add multiple Read locks to the same data. After the data is added to a Read Lock, it cannot be added to a write Lock. Therefore, other transactions cannot write to the data, but can still Read the data. For a transaction holding a read lock, if the data has a read lock only on its own transaction, it is allowed to be upgraded directly to a write lock and then written to the data.
Range Lock: An exclusive Lock is applied to a Range within which data cannot be written.

The following statement is a typical example of range locking:

SELECT * FROM books WHERE price < 100 FOR UPDATE;
Copy the code

Note the difference between “a range cannot be written” and “a batch of data cannot be written”, i.e. do not think of a range lock as a set of exclusive locks. With a range lock, not only can you not modify existing data in that range, but you can’t add or delete any data in that range, which a set of exclusive locks cannot do.

Serializable

Serializable access provides the highest level of isolation, with the highest isolation level defined in ANSI/ISO SQL-92 being Serializable. Serializability is in line with the common programmer’s understanding of data contention locking. If performance optimization is not considered, serializability can be achieved by adding read locks, write locks and range locks to all read and write data of a transaction. Expanding locks and Shrinking locks were needed to determine the relationship between read locks, write locks, and data in two-phase locks, 2PL). The Concurrency Control theory dictates that isolation is at odds with Concurrency. The higher the isolation, the lower the throughput for concurrent accesses. Modern databases must provide users with the option to adjust isolation levels other than serializability. The fundamental purpose is to allow users to adjust the locking mode of the database to achieve a balance between isolation and throughput.

Repeatable Read

The next isolation level of serializability is Repeatable Read, which locks the Read and write data involved in the transaction and holds it until the end of the transaction, but no more range locks. Repeatable Reads are less serializable than Phantom Reads, where two identical range queries produce different result sets during a transaction.

SELECT count(1) FROM books WHERE price < 100	/* Time order: 1, transaction: T1 */
INSERT INTO books(name,price) VALUES ('Understanding the Java Virtual Machine in depth'.90)	/* Time order: 2, transaction: T2 */
SELECT count(1) FROM books WHERE price < 100	/* Time order: 3, transaction: T1 */
Copy the code

Lock, according to the scope of the face before, read and write locks lock definition, if the SQL statement repeated twice in the same transaction, and between the two execution happens to have another transaction in database insert a book is less than 100 yuan, it’s will be allowed, then the two the same query will get different results, The reason is that repeatable reads do not have a scope lock to prevent the insertion of new data within that scope, which is a sign that a transaction is affected by other transactions and the isolation is broken.

Read Committed

The next isolation level for repeatable reads is Read Committed, where write locks are applied to the data involved in a transaction until the end of the transaction, but Read locks are released immediately after the query is complete. Read committed is weaker than Repeatable Reads because of non-repeatable Reads, which means that two queries on the same row get different results during the transaction execution.

SELECT * FROM books WHERE id = 1;  /* Time order: 1, transaction: T1 */
UPDATE books SET price = 110 WHERE id = 1; COMMIT; /* Time order: 2, transaction: T2 */
SELECT * FROM books WHERE id = 1; COMMIT;  /* Time order: 3, transaction: T1 */
Copy the code

If read isolation level is submitted, the two repeat the query results will be different, because the read committed isolation level lack throughout the entire transaction cycle read locks, could not read the data changes, banned the update statement in the transaction at this time T2 can immediately submitted successfully, it is also affected by other transactions, a transaction isolation are destroyed. If the isolation level is repeatable, transaction T2 cannot acquire the write lock because the data has been read locked by transaction T1 and will not be released immediately after reading, and the update will be blocked until transaction T1 is committed or rolled back.

Read Uncommitted

The next level of Read committed is Read Uncommitted, where a write lock is applied to the data involved in a transaction until the end of the transaction, but no Read lock is applied at all. Read uncommitted is weaker than read committed in Dirty Reads, which occur when one transaction Reads uncommitted data from another transaction during the execution of a transaction.

SELECT * FROM books WHERE id = 1; /* Time order: 1, transaction: T1 */
/* Note that there is no COMMIT */
UPDATE books SET price = 90 WHERE id = 1; /* Time order: 2, transaction: T2 */
/* SELECT */
SELECT * FROM books WHERE id = 1; /* Time order: 3, transaction: T1 */
ROLLBACK; /* Time order: 4, transaction: T2 */
Copy the code

However, transaction T1 has sold several copies for $90 after the price change. The reason is that the uncommitted data does not have a read lock on it at all, which in turn allows it to read data on which other transactions have a write lock. That is, the two query statements in transaction T1 get different results. If you can’t understand the “instead of” the word in this sentence, please re-read the definition of a write lock: write locks to ban other transactions read lock, not prohibit reading data transaction, if the transaction T1 read data does not need to add read lock, will cause the transaction T2 uncommitted data also can be read by transaction T1 immediately. This is also a sign that a transaction is affected by other transactions and the isolation is broken. If the isolation level is read committed, the second query in transaction T1 cannot obtain the read lock because transaction T2 holds the write lock on the data, while the read committed level requires the read lock to be added before the data is read, so the query in T1 is blocked until transaction T2 is committed or rolled back.

Theoretically, there is a lower level of isolation, which is “no isolation at all,” meaning that neither read nor write locks are added. Read not commit a Dirty problem, but you don’t have Dirty Write questions (Dirty Write), namely a transaction before submit modifications can be covered by another transaction to modify, Dirty writing has not merely is isolation problems, it will cause the transaction atomicity is not possible, so generally talking about isolation levels not included in the scope of it, Read uncommitted is considered the lowest isolation level.

In fact, different isolation levels and illusory read, unrepeatable read, dirty read and other problems are only superficial phenomena, which are the result of the combination application of various locks in different locking time. The fundamental reason for different isolation levels of databases is to realize isolation by means of locks.

Isolation level	Write lock	Read lock	The lock range
Serializable	The lock continues until the end	The lock continues until the end	The lock continues until the end
Repeatable Read	The lock continues until the end	The lock continues until the end	—
Read Committed	The lock continues until the end	Release immediately after the query is complete	—
Read Uncommitted	The lock continues until the end	—	—

Lock processing: read sharing, read and write mutually exclusive, write mutually exclusive, scope lock a scope exclusive.

MVCC (Multi-version Concurrency Control)

Besides are accomplished to lock, the above four isolation levels there is another common characteristic, is the phantom read, not repeatable read, dirty problems are due to a transaction in the process of reading data affected by another writing data transaction and destroyed the isolation, for this kind of “a transaction read + write another transaction isolation problems, In recent years, a lock-free optimization called Multi-Version Concurrency Control (MVCC) has been widely adopted by mainstream commercial databases.

MVCC is a read optimization strategy whose “lock free” means that the read does not need to be locked. The basic idea of MVCC is that any changes to the database will not directly overwrite the previous data, but instead produce a new copy that will coexist with the old version so that it can be read without locking at all. In this case, “version” is a key word. You can think of version as two invisible fields for each row in the database: CREATE_VERSION and DELETE_VERSION, both of which record the value of the transaction ID, which is a globally strictly incrementing value, then write the data according to the following rules.

When data is inserted: CREATE_VERSION Records the transaction ID of the inserted data. DELETE_VERSION is empty.
When deleting data: DELETE_VERSION Indicates the transaction ID of the deleted data. CREATE_VERSION is empty.
When modifying data: The modified data is regarded as a combination of Delete old data and insert new data. That is, the original data is copied first. DELETE_VERSION of the original data records the transaction ID of the modified data, and CREATE_VERSION is empty. CREATE_VERSION of the copied new data records the transaction ID of the modified data. DELETE_VERSION is empty.

At this point, if another transaction reads the changed data, the isolation level determines which version of the data should be read.

The isolation level is repeatable:

Always read records whose CREATE_VERSION is less than or equal to the current transaction ID. This ensures that the transaction reads rows that either existed before the transaction started or were inserted or modified by the transaction itself.
Always read records whose DELETE_VERSION is undefined or greater than the current transaction ID to ensure that rows read by the transaction are not deleted before the transaction begins.

The isolation level is read committed: always take the latest version, that is, the most recently committed version of the data record.

MVCC is not necessary for either of the other isolation levels, because the raw data can be modified without committing, and the version field is not required at all when other transactions view the data. Serializable semantics are meant to block reads from other transactions, and MVCC is read-lock-free optimization, so it’s not meant to be used together.

MVCC is optimized only for “read and write” scenarios. If two transactions are modifying data at the same time, i.e., “write and write”, there is little room for optimization and locking is almost the only viable solution. There is a bit of debate about whether the Locking strategy is “Optimistic Locking” or “Pessimistic Locking”. The previous locking strategies introduced by the author are pessimistic locking strategies, that is, if you do not do locking and then access data, there will be problems. Optimistic locking strategies, on the other hand, assume that the existence of contention between transactions is accidental and the absence of contention is common, and that locking should not be done in the first place, but should be remedied when competition occurs. This idea is called “Optimistic Concurrency Control” (OCC). Due to length and subject matter, I don’t want to discuss it anymore, but I caution that there’s no reason to believe that Optimistic locks are faster than pessimistic ones. It’s just a matter of how intense the competition is. If the competition is intense, optimistic locking is slower.

2. Global transactions (single service using multiple data sources)

The opposite of a local Transaction is a Global Transaction, also referred to in some sources as an External Transaction. In this section, a Global Transaction is defined as a transactional solution for scenarios where a single service uses multiple data sources.

2.1. XA transaction Architecture

In 1991, The X/Open Organization (later incorporated into The Open Group) proposed a transactional Architecture called X/Open XA (XA stands for eXtended Architecture) to solve The problem of consistency in distributed transactions. At its core, it defines the communication interface between a global Transaction Manager (which coordinates global transactions) and a local Resource Manager (which drives local transactions). The XA interface is bidirectional and can form a communication bridge between a transaction Manager and multiple Resource managers. By coordinating the consistent actions of multiple data sources, the unified submission or rollback of global transactions can be realized. The names XADataSource and XAResource that we still occasionally see in Java code come from this.

However, XA was not a technical specification for Java (XA proposed that there was no Java at the time), but rather a generic set of language-neutral specifications, so the JSR 907 Java Transaction API was specifically defined in Java, The STANDARD for global transaction processing, now known as JTA, was implemented in the Java language based on XA pattern. The two main interfaces of JTA are:

Interface to the transaction manager:javax.transaction.TransactionManager. This set of interfaces is used by Java EE servers to provide container transactions that are automatically responsible for transaction management, and another set is providedjavax.transaction.UserTransactionInterface for manually starting, committing, and rolling back transactions through program code.
Resource definition interfaces that meet the XA specification:javax.transaction.xa.XAResource, any resource (JDBC, JMS, and so on) that wants to support JTA simply implements the methods in the XAResource interface.

JTA is a Java EE technology that is typically supported by Java EE containers such as JBoss, WebSphere, and WebLogic. But now Bittronix, Atomikos, and JBossTM (formerly Arjuna) all implement JTA interfaces as JAR packages called JOTM (Java Open Transaction Manager), This allows us to use JTA in Java SE environments such as Tomcat and Jetty.

Two-phase Commit (2 Phase Commit, 2PC) protocol

What happens if the bookstore’s users, merchants, and warehouses are in separate databases, and all else remains the same? It may look the same as a local transaction if you code it as a declarative transaction with the @Transactional annotation. However, if you implement it as a programmatic transaction, you will see the difference in how it is written. This is what the Transactional code looks like:

public void doPayment(PaymentBill bill) {
    accountTransaction.begin();
    stockTransaction.begin();
    businessTransaction.begin();
	try {
            accountTransaction.pay(bill.getMoney());
            stockTransaction.deliver(bill.getItems());
            businessAccountService.receipt(bill.getMoney());
            accountTransaction.commit();
            stockTransaction.commit();
            businessTransaction.commit();
	} catch(Exception e) { accountTransaction.rollback(); stockTransaction.rollback(); businessTransaction.rollback(); }}Copy the code

We can see from the code, the program aims to do transaction commit three times, but in fact the code and can’t write, just imagine, if there is an error in the businessTransaction.com MIT (), the code to perform in the catch block, At this point, the accountTransaction and stockTransaction are committed, and calling the ROLLBACK () method will not help. This will cause some data to be committed and some data to be rolled back, and the overall transaction consistency will not be guaranteed. To solve this problem, XA splits the transaction commit into a two-phase process:

Prepare phase: Also known as the vote phase, in which the coordinator asks all participants in a transaction if they are ready to commit. Participants respond Prepared if they are ready to commit and non-prepared if they are not. Preparing is not the same as preparing in human language. For a database, a redo log is used to Record the contents of the entire transaction Commit. The difference is that the Commit is not written to the last Commit Record. This means that the isolation is not released immediately after the data is persisted, that is, the lock is still held and the data is kept isolated from other non-transactional observers.
Commit phase: In the execution phase, if the coordinator receives the Prepared message from all participants in the previous phase, he/she will set the transaction state to Commit locally and send the Commit command to all participants. All participants will Commit immediately. Otherwise, if any of the participants reply to the non-prepared message, or if any of the participants time out and do not reply, the coordinator persists his transaction state as Abort and sends the Abort instruction to all participants, who immediately perform the rollback operation. The Commit operation at this stage should be lightweight for the database. Simply persisting a Commit Record can usually be done quickly. It is only when Abort is received that committed data needs to be cleaned from the rollback log, which can be a relatively heavy operation.

These two processes are known as the “2 Phase Commit” (2PC) protocol, and there are several other prerequisites for its success in ensuring consistency.

You must assume that the network is reliable during the short period of commit, that is, messages are not lost during commit. At the same time, it is also assumed that there is no error in the whole process of network communication, that is, messages can be lost, but no wrong messages will be delivered. The design goal of XA is not to solve problems such as the Byzantine general. In two-stage commit, the failure of voting phase can be remedied (rollback), while the failure of commit phase cannot be remedied (the result of commit or rollback is not changed, and the crashed node can only be restored). Therefore, the time of this phase should be as short as possible, in order to control network risks as much as possible.
It must be assumed that nodes lost due to network partitions, machine crashes, or other causes will eventually recover and not remain permanently lost. Since the complete redo log has been written in the preparation phase, once the lost machine recovers, it is able to find ready but uncommitted transaction data from the log and query the status of the transaction with the coordinator to determine whether commit or rollback should be performed next.

Two-phase commits are simple and not difficult to implement, but they have several very significant disadvantages:

Single point problem: The coordinator plays a pivotal role in the two-paragraph submission. When the coordinator is waiting for the response of the participant, there can be a timeout mechanism, allowing the participant to break down, but when the participant is waiting for the instruction of the coordinator, it cannot do timeout processing. If the downtime is not for one of the participants, but for the coordinator, all participants are affected. If the coordinator never recovers and does not send Commit or Rollback normally, then all participants must wait.
Performance issues: In the two-phase Commit process, all participants are equivalent to being bound into a unified scheduling whole, which requires two remote service calls and three data persistence (redo log is written in the preparation stage, state persistence is done by the coordinator, and Commit Record is written in the log in the Commit stage). The process continues until the slowest processing operation in the cluster of participants is completed, which results in poor performance for two-stage commits.
Consistency risk: As mentioned earlier, the establishment of two-stage commit is conditional, and consistency problems may occur when the assumption of network stability and downtime recovery is not established. In 1985, Fischer, Lynch and Paterson proposed the “FLP Impossibility principle”, proving that there is no distributed protocol that can correctly achieve consistent results if downtime cannot be recovered eventually. This principle is the same as the “CAP can’t have it all” principle in distributed theory. The consistency risk caused by network stability refers to: Although the commit phase of time is very short, but it is still a clear existence crisis, if the coordinator sends a ready to instruction, according to each participant received back the information to determine the state of affairs can be submitted, the coordinator will be persistent state of affairs, and submit their own affairs, if the network is suddenly disconnected, No longer able to issue a Commit command to all participants over the network, some data (the coordinator’s) is committed, but some data (the participant’s) is neither committed nor can be rolled back, resulting in data inconsistencies.

The 3 Phase Commit (3PC) protocol

In order to alleviate some of the shortcomings of the two-phase Commit protocol, specifically the single point problem of the coordinator and the performance problems in the preparation Phase, the “3 Phase Commit” (3PC) protocol was subsequently developed. The preparation phase of the two-stage commit is subdivided into two phases, called CanCommit and PreCommit, and the commit phase is renamed as DoCommit phase. The new CanCommit is an inquiry phase where the coordinator asks each participating database to evaluate whether the transaction is likely to complete successfully, based on its status.

Will be ready to stage two sides is heavy load operation at this stage is the reason, once the coordinator to start preparing the message, each participant will immediately start to write the redo log, they are involved in the data resource that is locked, if one participant said cannot be completed and submitted at this time is equivalent to all white made a round of busywork. Therefore, by adding a round of inquiries, if all responses are positive, the transaction is more likely to commit successfully, which means there is less risk of a rollback due to a crash of one participant committing. Therefore, in scenarios where the transaction needs to be rolled back, performance of the three-segment is usually much better than that of the two-segment, but in scenarios where the transaction can commit normally, performance of both is still poor, and even slightly worse for the three-segment because of one more query.

Also because the rollback probability of transaction failure is smaller, in a three-stage commit, if a coordinator outage occurs after the PreCommit phase, i.e. the participant does not receive a message that can wait for DoCommit, the default operation strategy is to commit the transaction rather than roll back the transaction or wait. This effectively avoids the risk of a single point of problem for the coordinator.

As you can see from the above process, three-phase commit improves the performance of single point problems and rollback, but it does not improve the consistency risk, and it even slightly increases the risk in this regard. For example, after entering the PreCommit phase, the coordinator issues an Ack but Abort instruction. If some participants fail to receive the coordinator’s Abort instruction until timeout due to network problems, these participants will commit the transaction incorrectly. This creates the problem of inconsistent data between different participants.