We know that transactions are an important mechanism for ensuring atomicity, consistency, isolation, and durability (ACID) of data. But are transactions safe? Is it 100% correct to read and write data using transactions? If you blindly use transactions without understanding the isolation mechanism of the database, you will end up with problems such as not ensuring data consistency or not being able to handle data reads and writes in high-concurrency scenarios.

How do dirty reads, unrepeatable reads, and phantom reads occur

Let’s look at some examples of performing the following transactions with an isolation level of Read Uncommitted (we’ll come back to that later). Because the proper isolation method was not chosen, each of these operations ended up with a different result than expected.

Dirty read

Unrepeatable read

Phantom read

If the site is a personal site with a small number of users, and each user’s request is made at a different time, then this probably won’t happen. But if this is a high-concurrency scenario, and hot data is being read or written, the chances of misreading or miswriting data are much higher.

How do I prevent dirty, unrepeatable, and phantom reads

So, one might ask, how can these situations be avoided?

Major relational databases provide isolation levels. Take MySQL for example:

MySQL isolation level

MySQL provides four isolation levels, the higher the level, the more consistent the read and write. But consistency comes at a cost. Serializable, for example, requires that transactions be serialized, such that a transaction cannot be read while being written. In high concurrency scenarios, the load on the system is relatively high.

Isolation level: READ COMMITTED

Isolation level: REPEATABLE READ

Implementation of different isolation levels

The point is, with all these concepts and phenomena, if the interviewer asks why “READ COMMITTED” can avoid dirty reading. REPEATABLE READ can avoid REPEATABLE READ. Can you answer that?

Multiversion Concurrency Control

MySQL tables also have hidden columns in addition to user-defined columns.

The column name Length (bytes) role
DB_TRX_ID 6 The transaction identifier of the last transaction to insert or update a row. (Delete as update, mark as deleted)
DB_ROLL_PTR 7 Write undo logging to the rollback segment (if the row has been updated, the undo logging contains information needed to rebuild the contents of the row before updating it)
DB_ROW_ID 6 Row identifier (hiding monotonic increment ID)

Every time a transaction makes a change to the data, the old data is backed up to the Undo log, and the database is updated. The DB_ROLL_PTR of the new data points to the replicated data in the Undo log.

This is repeated every time the same data is modified by a different transaction, eventually creating a version chain that continuously points to the old data through DB_ROLL_PTR.

T1 can read Empty because it is not reading the latest contents of the database, but the latest state of the version chain and TRX_ID == 1.

The following article gives a detailed introduction to MVCC, which will not be repeated here. Recommended reading: MySQL transaction Isolation Level and MVCC

Memory method

Dirty read

Dirty is as opposed to clean. “Clean” data is the data that has been committed, and “dirty” data is the data that was committed before. Reading data before committing is called dirty reading.

Unrepeatable read

The repetition is that every time you read a piece of data you get the same result. Non-repeatable read means that the two reads in one transaction have different results, indicating that other transactions have modified the record between the two reads.

Phantom read

Illusion is an illusion, reading data that doesn’t exist. The reason why non-existent data is read is that during the transaction, another transaction inserts a record and thus reads non-existent data.

Refer to the article

MySQL transaction isolation level and MVCC InnoDB multiversion concurrency control (MVCC)