The introduction

General database textbooks will introduce that transactions have four characteristics of ACID, namely Atomicity, consistency, Isolation and duriablity. This article focuses on the isolation nature, or isolation, of transactions.

When it comes to transaction isolation, it generally refers to the four isolation levels under the ANSI SQL standard: Read Uncommitted, Read Committed, Repeatable Read and Serialibility. However, the ANSI SQL transaction isolation level standard has the following limitations:

  • The isolation of write operations is not mentioned
  • The ANSI SQL standard is older and does not adequately describe the level of isolation achieved with multiple versions of concurrency control

This article focuses on the transaction isolation level under the ANSI SQL standard, and discusses its limitations, as well as its extensions. This article is organized as follows:

  • Transaction isolation level under THE ANSI SQL standard
  • ANSI SQL standard limitations and extensions

This paper is presented in papers, a study and summary of papers on distributed systems. This article is also published on my blog (oserror.com).

Transaction isolation level under THE ANSI SQL standard

In a database, multiple transactions tend to be executed concurrently, and there can be interference between transactions, resulting in incorrect data. To ensure that transactions do not interfere with each other, the simplest solution is to execute transactions sequentially, but this reduces throughput. To this end, the ANSI SQL standard introduced transaction isolation levels, which describe the various interference levels of concurrent transactions, so that applications can make decisions about throughput and correctness. Different transaction isolation levels guarantee different degrees of correctness. In general, lower transaction isolation levels lead to higher throughput and lower correctness.

The isolation levels for ANSI SQL are designed to address the various interference phenomena that occur in applications, and the isolation levels are designed to address the following three phenomena:

  1. Dirty read (P1)
  2. Non-repeatable read (P2)
  3. Phantom read (P3)

First, look at the time sequence of the occurrence of dirty reading P1:

Transaction T1 writes data X, transaction T2 reads data X, and transaction T1 rolls back. At this point, transaction T2 reads dirty data

Second, look at the sequence in which non-repeatable read P2 occurs:

Transaction T1 reads data X, transaction T2 writes data X, transaction T2 commits, transaction T1 reads data X again, the two read data X is different

Finally, look at the time sequence of phantom P3

Transaction T1 reads the data satisfying X>=m and X<=n, and transaction T2 inserts a data satisfying X>=m and X<=n. Transaction T2 commits, and transaction T1 reads the data satisfying X>=m and X<=n again, and the two read data are different

A slightly formal language is used to describe the timing of the occurrence of the above phenomenon. Assume that W1(X) indicates that transaction T1 modifies the data item X, while R2(X) indicates that transaction T2 reads the data item X. W1(P) indicates that transaction T1 has modified the data item satisfying predicate condition P, while R2(P) indicates that transaction T2 has read the data item satisfying predicate condition P. C1 represents transaction T1 commit and A1 represents transaction T1 rollback.

Dirty read P1

W1(X)… R2(X)… A1… R2(X)

P2 cannot be read repeatedly

R1(X)… W2(X)… C2… R1(X)

Phantom read P3

R1(P)… W2(P)… C2… R1(P)

The ANSI SQL standard sets four transaction isolation levels for three phenomena, as follows:

  1. Read Uncommitted: P1, P2, and P3 may occur
  2. Read Committed: P1 cannot occur, P2 and P3 can occur
  3. Repeatable Read: unlikely to occur P1, P2, P3 is possible
  4. Serializable: P1, P2 and P3 cannot occur

The relationship between the isolation levels of the entire transaction and the phenomena to be eliminated is shown below:

ANSI SQL Isolation Levels

It is worth noting that P1, P2, and P3 cannot occur under Serializable, but it does not follow that P2 and P3 must be in Serializable without P1. True Serializable means that the result of a transaction executed concurrently is the same as a result of each transaction executed sequentially.

ANSI SQL standard limitations and extensions

As mentioned above, ANSI SQL has the following limitations:

  • The isolation of write operations is not mentioned
  • The ANSI SQL standard is older and does not adequately describe the level of isolation achieved with multiple versions of concurrency control

For the first point, there is no mention of isolation of write operations, as follows

Dirty write P0

Transaction T1 changes X, transaction T2 changes X, and transaction T1 rolls back to what value

For example, if X=100 before transaction T1 changes X, transaction T1 needs to change X to 10 and transaction T2 needs to change X to 20, as follows:

X=100
W1X(10)
W2X(20)
A1Copy the code

If transaction T1 rolls back to X=20, then transaction T2 cannot roll back to 100. If you roll back to X=100, then transaction T2 commits without knowing that X=20 is being written, in which case there is no solution.

Therefore, write must be isolated, otherwise, transaction correctness cannot be guaranteed. Therefore, all isolation levels under the ANSI SQL standard need to be enhanced to a level where P0 does not occur.

For write operations, write loss occurs in the following scenarios

Write a lost P4

Transaction T1 reads data X, transaction T2 reads data X, transaction T2 modifies data X, transaction T1 modifies data X, and ultimately writes dirty data

For example, if X=100 originally, transaction T1 adds 5 to X and transaction T2 subtracted 1 from X, the expected result would be 100+5-1=104, whereas in the case of write loss, it would be as follows

T1         T2
R(X=100)
         R(X=100)
         W(X=99)
W(X=104)Copy the code

In the above scenario, when the data is finally written to the database, X=104, which belongs to data inconsistency and should be eliminated.

Write loss can occur at the Read Committed level of ANSI SQL, but not at the Repeatable Read level. To do this, a Cursor Stability isolation level is introduced to ensure that P4 does not occur. However, P4 does not occur at Repeatable level because P2 prohibits the scenario where transaction T2 writes X after transaction T1 reads X. Therefore, the isolation level of Cursor Stability is between Read Committed and Repeatable Read.

Existing industry products, such as MySQL, use MVCC and other multi-version concurrency control to achieve some kind of isolation, while the ANSI SQL standard does not consider multi-version concurrency control, so it is worth discussing the difference between the isolation level under multi-version concurrency control and the isolation level under existing ANSI SQL standards.

The following is a simple way of thinking about multi-version concurrency control: Basic Time Ordering

  1. When transaction T1 starts, apply for a Timestamp, denoted as start-timestamp
  2. The read of transaction T1 will not block because it will read its version before start-timestamp, and other changes made after start-timestamp are not visible to the transaction
  3. Changes to transaction T1 itself, including updates, inserts, and deletions, are stored in the context of the transaction, making it easier for the transaction itself to read the modified data repeatedly
  4. When transaction T1 commits, it gets a commit-timestamp that is guaranteed to be larger than the start-timestMap and commit-timestMap of all other existing transactions. If there is another transaction commit-timestamp within [start-TIMESTAMP, commit-timestamp] of transaction T1, and the transaction modifies the data changed by transaction T1, then transaction T1 will be terminated; otherwise, transaction T1 will Commit

Step 4 ensures that P4 write loss does not occur in the Basic Time Ordering system. The isolation level reached by the Basic Time Ordering system is called Snapshot level. Snapshot must be Read Committed because records Committed before start-TIMESTAMP are Read.

Next, the relationship between Snapshot and Repeatable Read is discussed:

The isolation level of Snapshot is higher than Repeatable Read, but consider the following scenarios:

R1(X)… R2(Y)… W1(Y)… W2(X)… C1… C2

This scenario is forbidden at Repeatable Read level because T2 reads data Y and T1 changes it and commits it, resulting in unrepeatable reads. For Snapshot, if X+Y is greater than 0, this constraint may be violated

X= before two transactions start1,Y=2, the timestamp is10000Transaction T1 starts, application time stamp is10001And R1 = (X)1Transaction T2 begins with an application timestamp of10003R2 = (Y)2Transaction T1, W1(Y)=0Transaction T2, W2(X)=0Transaction T1 commits and finds X+Y=1+0, satisfy the constraint transaction T2 commit, and find X+Y=0+2, satisfy the constraintCopy the code

Although both transaction T1 and transaction T2 are considered to satisfy the constraint, the constraint is not satisfied after both transactions are completed.

Therefore, the isolation level of Snapshot is less than Repeatable Read in this case. Therefore, the isolation level of Snapshot is neither greater than nor less than Repeatable Read.

conclusion

Transaction isolation levels are an interesting topic, and there are currently several techniques for implementing various isolation levels. The following articles will discuss techniques for implementing transaction isolation levels, as well as techniques used in industry products, as follows:

  • Two Phase Locking
  • Basic Time Ordering
  • Multi-version Concurrentcy Control
  • Optimistic Concurrency Control
  • What technologies are used by MySQL, Oracle, Spanner, etc to do transaction isolation

PS: The update of this blog will be pushed to the wechat public account as soon as possible, welcome everyone to follow.

qocde_wechat

reference

  • A Critique of ANSI SQL Isolation Levels