Dirty read, magic read, unrepeatable read, current read, snapshot read, these terms often make people dizzy. Because the main line of ordinary people’s brain is single thread, and can not deal with multiple transactions at one time.

To be memorable, we need to rely on several examples. After reading this article, you will be suddenly enlightened and can’t help but walk three times in a row.

But before we do that, we need to take a look at what the current database isolation level is. For example, MySQL.

 select @@tx_isolation;
Copy the code

MySQL has four levels of isolation, of course, for data. To change the isolation level, use the following SQL statement.

set session transaction isolation level read uncommitted;
set session transaction isolation level read committed;
set session transaction isolation level repeatable read;
set session transaction isolation level serializable;
Copy the code

Ok, let’s create a little test table to see how magic works in a concurrent environment.

CREATE TABLE `xjjdog_tx` (
	`id` INT(11) NOT NULL,
	`name` VARCHAR(50) NOT NULL COLLATE 'utf8_general_ci',
	`money` BIGINT(20) NOT NULL DEFAULT '0'.PRIMARY KEY (`id`) USING BTREE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
INSERT INTO `xjjdog_tx` (`id`, `name`, `money`) VALUES (2.'xjjdog1'.100);
INSERT INTO `xjjdog_tx` (`id`, `name`, `money`) VALUES (1.'xjjdog0'.100);
Copy the code

1. Dirty reads

Dirty read: dirty data is read. What is dirty data? Data that has not yet been committed by another transaction. At the Read Uncommitted isolation level, dirty reads occur. Take the sequence below

A transaction is A:setsession transaction isolation level read uncommitted; Transaction B:setsession transaction isolation level read uncommitted; A transaction is A:STARTTRANSACTION ; Transaction B:STARTTRANSACTION ; Transaction A: UPDATE xjjDOG_txSET money=money+100 WHERE NAME='xjjdog0'; Transaction B: UPDATE xjjDOG_txSET money=money+100 WHERE NAME='xjjdog0'; A transaction is A:ROLLBACK; Transaction B:COMMIT; Transaction B:SELECT * FROM xjjdog_tx ;
Copy the code

In this scenario, the original value of money is 100, 100 is incremented in both sessions, and then one of the session transactions is rolled back. As a result, the value of money remains the same as 100. So one of the operations that added 100 was overwritten.

So there are several conditions for dirty reads to occur.

  1. In A high concurrency scenario, another transaction is involved in reading and writing the rows involved in transaction A before the transaction A is finished
  2. Read uncommitted at the lowest transaction isolation level
  3. After you use the data, transaction A rolls back, causing the data you received before to no longer exist

To resolve this problem, set the isolation level to higher than Read Uncommitted.

2. It cannot be read repeatedly

Setting the isolation level to Read COMMITTED to avoid dirty reads makes perfect sense. The root cause of dirty reads is that other operations are interrupted during the execution of A transaction. This isolation level requires that the modified value can be read by transaction B only after transaction A commits. Therefore, dirty reads cannot occur and are fundamentally eliminated.

However, read commited causes an unrepeatable read.

As the name implies, two results are produced for one value read in one transaction cycle.

Unrepeatable, proving that the world doesn’t always revolve around you. There are countless other transactions that are executing during your transaction, and if your transaction lasts longer than these, you may read two or more values.

Let me tell you a story.

Once upon a time, there was a peach tree with 12 peaches. There was a monkey, xjjDog, and he wanted to eat the peach on top, but the peach was not ripe.

The next day when he went to see, he found that one peach was missing, instead of 11. After careful inquiry, it turned out that monkey A had eaten one of them first.

The next day to see the time, peach and less one, into 10, the original is greedy monkey B eat a.

And so on, the peaches dwindled down to the last two, but they were still not ripe.

Xjjdog picked the last two peaches and was about to munch on them when a monkey X jumped out and said, “I’ve been staring at these peaches for a year…

In this story, the transaction duration of monkey A and MONKEY B is 1 day; Xjjdog’s transaction cycle lasts until the peach is ripe; Monkey X lasts longer, maybe a year. They don’t always see twelve peaches a day. Today’s peach may have been eaten by another monkey (business), resulting in a different observation result, which is the concept of unrepeatable reading.

Sometimes, even if the values read are the same, that doesn’t prove it’s ok. For example, a financial officer embezzled 200 million yuan to trade stocks, and then returned 200 million yuan at the end of the month. Although the final amount is the same, you can’t find the difference due to the long reconciliation period.

How do you solve non-repeatable reads? Want to see first cannot repeat read is not a problem.

Some systems require such logic that each time a different value is read in a transaction, it can be tolerated. But if you want to keep the number of peaches under your control until they’re ripe, unrepeatable reading is a problem.

A very good way to do this is to keep XjjDog under the peach tree. When another monkey tries to pick a peach, he sends it away. This approach works, but is very inefficient in the database, which is serializable.

MySQL has a default transaction isolation level called REPEATable Read using MVCC (InnoDB) which is more lightweight.

3. Repeat

That’s where MVCC (Multi-Version Concurrency Control) comes in. It has three characteristics.

  1. Each row of data has a version that is updated each time the data is updated
  2. When modifying, copy a copy. The current version can be modified at will without interference between transactions
  3. If the original record is successfully overwritten, rollback is performed if the original record fails

MVCC in InnoDB is mainly to improve the performance of database concurrency, with a better way to deal with read-write conflict, even if there is read and write conflict, can also do not lock, non-blocking concurrent read. There are also three key technologies for its implementation:

  1. Three implicit fields:DB_TRX_ID, recently changed its transaction ID;DB_ROLL_PTR, rollback pointer to previous version;DB_ROW_ID, hide the primary key
  2. Undo log: changes to the same record generate a linked list of version changes for that record
  3. Read View: A read view generated when a snapshot is being read. In addition to using the additional information above, it also maintains an active set of transaction ids

The key to everything is the word snapshot.

For example, if transaction A makes A snapshot Read to A record, A Read View is generated at the moment the snapshot is Read. At this point, transactions B and C are not committed, and transactions D and E are committed before the ReadView is created, so the ReadView cannot Read the changes made by transactions B and C.

Unfortunately, repeatable reads can only solve the problem of non-repeatable reads from snapshots. The timing of snapshot reads also affects the accuracy of reads. Look at the following two cases.

In this case, we read 500.

A transaction Transaction B
Open the transaction Open the transaction
Snapshot read (no impact) The query amount is 500 The query amount of snapshot read is 500
The update amount is 400
Commit the transaction
select Read the snapshotAmount to 500
select lock in share modeThe current readingAmount to 400

In this case, we read 400.

A transaction Transaction B
Open the transaction Open the transaction
Snapshot read (no impact) The query amount is 500
The update amount is 400
Commit the transaction
select Read the snapshotAmount to 400
select lock in share modeThe current readingAmount to 400

(Table from [SnailMann’s] blog).

4. The magic to read

Phantasmagoria, the word itself is very psychedelic. Hallucinations occur at RU, RC, RR levels.

Take the simplest example. When you select whether a record exists and then you try to insert it, if it doesn’t exist, and then you insert it, but when you actually insert it, you get an error, the record already exists, that’s a phantom read.

First, confirm the current repeatable read level. If not, modify it.

SELECT @@tx_isolation
# set session transaction isolation level repeatable read
Copy the code

Let’s take a look at this supernatural process.

There are five steps, and I’ve marked them all down for you. Here are some of them.

  1. Transaction A starts A transaction with begin, and then queries the record with ID 3, which does not exist. But because snapshot reads open a read view for records with ID 3, records with ID 3 cannot be read throughout this transaction. Good, that’s all we need for unrepeatable reading
  2. Next, transaction B inserts a record with ID 3 and commits successfully
  3. Transaction A also wants to insert the record at this point, so it performs the same insert operation and the database reports an error indicating that the record already exists
  4. Transaction A looks confused and tries to see what the record is, but when it executes the SELECT statement again, it can’t find the record
  5. In other transactions, however, this record is visible because it has been committed correctly

This is illusory reading.

5. How to solve phantom reading

Is there something wrong with magic reading? Most of the time it’s true, but the error is a little weird. To prevent phantoms, you need to enable high-intensity locking like FOR UPDATE, which is rarely used.

Insert error but select error This brings us to the two modes of database reading:

  1. Snapshot read: A normal select operation that reads data from a Read viewmayIt’s historical data
  2. Current read: insert, update, delete, select.. For update always reads the latest data

For current reads, the rows you read and the gaps between them are locked until the transaction commits. Other transactions cannot be modified, so there are no unrepeatable or phantom reads. So inserts can find conflicts that normal select can’t. To solve illusionary reading, you need to add X lock. In the above case, it can be executed in transaction A:

SELECT * FROM xjjdog_tx WHERE id=3 FOR UPDATE
Copy the code

When it does so, it creates a lock even if the record with ID 3 does not exist (a row X lock or a next-key lock gap X lock may be added behind it depending on whether the record exists or not).

6. Summary

Here’s a quick summary.

Dirty reads are when one transaction reads a record that another transaction has not committed yet. Problems arise when other transactions roll back.

Non-repeatable reads, which means that multiple reads of the same transaction may result in inconsistent results. This is because other transactions have modified these records during the execution of the transaction.

MySQL is repeatable by default, but hallucinations can occur. Phantom reads are caused by the difference between snapshot reads and current reads.

To solve the illusion, you need to add a lock (X lock, Gap lock, etc.), such as for update, all the current read until the end of the transaction, no problem.

The so-called highest level serializable, but all of the current read, in high concurrency environment efficiency, imaginable. So it’s almost useless.