Column series: MySQL column series

MVCC

MVCC is introduced

The DELETE statement or UPDATE statement that updates the primary key does not immediately DELETE the corresponding record directly from the page. Instead, it sets the record header delete_mask to 1 and marks the deletion. This is primarily for MVCC.

Then on the basis of the transaction This article introduces the concurrent transactions there will be a dirty, dirty read, written not repeatable read, read the four questions, dirty can write by optimistic or pessimistic locks lock, dirty reads, non-repeatable reads, phantom read three questions need database provide a certain amount of transaction isolation mechanism to solve, namely transaction isolation.

There are four transaction isolation levels in the SQL standard, which address some of the problems associated with concurrent transactions:

Dirty read, unrepeatable read, unreal read are all concurrent read problems, the simplest way is to add a lock to the record, whether update, read records need to compete to the lock after the operation. However, the concurrency performance of this approach is predictably low.

To address the Concurrency problem, InnoDB designed an MVCC, or Multi-version Concurrency Control. When performing SELECT queries at RC and RR isolation levels, database performance is improved by accessing the version chain of records without locking, enabling read-write operations of different transactions to be executed concurrently.

Undo log Version chain

MVCC is implemented by reading a version chain of records, which is actually a version chain formed by undo log.

Taking the picture in the article UndoLog on transaction atomicity as an example, it can be seen intuitively that the undo log generated by adding, deleting and changing is connected into a one-way linked list through old roll_pointer, and the hidden column roll_pointer in the record points to the latest undo log. This is the header of the undo version chain.

The value in the record is always the most recently updated, and the transaction that might update this record has not yet committed:

  • For transactions using RU isolation level, since records modified by uncommitted transactions can be read, it is good to simply read the latest version of the record.

  • For transactions with SERIALIZABLE isolation level, InnoDB uses locking to access records. We’ll talk about that later.

  • For transactions using RC and RR isolation levels, you must ensure that the records modified by the committed transaction are read. If the records modified by another transaction have not been committed, the latest version of the record cannot be read directly. In this case, you can look up the version visible to the current transaction along the Undo version chain.

ReadView

In the previous section, we explained that in order to ensure that the records modified by committed transactions are read at the RC/RR isolation level, we need to find the version visible to the current transaction on the Undo version chain. So how do you determine which version on the version chain is visible to the current transaction?

InnoDB designs a ReadView, which is created when a transaction is executed.

ReadView has four key properties:

  • m_ids: A list of transaction ids that were active in the system when ReadView was generated.
  • min_trx_id: The smallest transaction ID of the current active transaction in the system when the ReadView was generated, i.em_idsThe minimum value of.
  • max_trx_id: Specifies the global transaction ID assigned to the next transaction when the ReadView is generated.Max Trx Id), notice nom_idsThe maximum value of.
  • creator_trx_id: Transaction ID of the transaction that generated the ReadView. A transaction ID is assigned only when an add, delete, or delete operation is performed. If it is a read-only transaction, creator_trx_id is assigned by default0.

MVCC principle

Undo version chain +ReadView mechanism

With ReadView, you can look up the version visible to the current transaction along the Undo version chain when querying in a transaction. This is where the hidden column trx_id in the Undo log comes in. It represents the transaction ID of the transaction at which the undo log was generated. The accessibility of this version is determined by comparing the value of the trx_id attribute in the undo log to the ReadView attributes.

To check whether the version is accessible, perform the following steps:

  • (1) If trx_id is creator_trx_id, the current transaction is accessing its own modified record, so this version can be accessed by the current transaction.

  • (2) If trx_id is less than min_trx_id, the transaction that generated this version was committed before the current transaction generated ReadView, so this version can be accessed by the current transaction.

  • ③ If trx_id is greater than or equal to max_trx_id, the transaction that generates this version is enabled only after the current transaction generates ReadView. Therefore, this version cannot be accessed by the current transaction.

  • If trx_id is between min_trx_id and max_trx_id, then check whether trx_id is in m_IDS list. If trx_id is between min_trx_id and max_trx_id, then check whether trx_id is in m_IDS list. If trx_id is between min_trx_id and max_trx_id, then check whether trx_id is in M_IDS list. If not, the transaction that generated the version of the ReadView when it was created has been committed and the version can be accessed.

The general flow chart is as follows:

RC and RR isolation levels

The difference between READ COMMITTED and REPEATABLE READ isolation levels is when they generate readViews.

  • READ COMMITTEDA separate ReadView is generated before each query.
  • REPEATABLE READOnly one ReadView is generated before the first query, and this ReadView is reused for subsequent queries.
  • READ UNCOMMITTEDYou don’t need to generate a ReadView and directly read the data of the row record.

Using the previous account table as an example, the operations are explained in chronological order.

1, the T1

Now the initial state of the account table is as follows, and the transaction ID of the last update is 100 and the value of card is AA.

The next transaction ID to be allocated is 150, and then the system has two transactions running with transaction IDS 130 and 135 respectively.

2, T2

Now open A new transaction A, query the data ID=1, will generate A ReadView as shown in the figure below:

Trx_id (100) is the same as creator_trx_id(0).

Trx_id (100) < min_trx_id(130) trx_id(100) < min_trx_id(130)

Transaction A (balance=0) returns only 1 data.

3, T3,

Then another transaction B (trx_id=150) updates this data, updating AA to BB, and the transaction has not yet committed.

Then the data with ID=1 is queried again in transaction A.

  • inRCAt the isolation level, a new ReadView is generated:

Trx_id (150) is between min_trx_id(130) and max_trx_id(160), and is also in m_IDS (130,135,150), so the data on the record row is not visible to this transaction. AA trx_id(100) < min_trx_id(130). So multiple reads at the RC isolation level can avoid dirty reads without seeing uncommitted updates from other transactions.

  • inRRAt isolation level, ReadView remains unchanged:

Trx_id (150) = max_trx_id(150), so the data on the record row is not visible to the transaction. AA trx_id(100) < min_trx_id(130). Therefore, after multiple reads at the RR isolation level, uncommitted updates of other transactions cannot be seen, preventing dirty reads.

4, T4

Transaction B then commits the update.

Then the data with ID=1 is queried again in transaction A.

  • inRCAt the isolation level, a new ReadView is generated:

Trx_id (150) is between min_trx_id(130) and max_trx_id(165), and not in m_IDS (130,135), so the data on the record row is visible to this transaction. Therefore, multiple reads at the RC isolation level can see updates committed by other transactions and cause unrepeatable reads.

  • inRRAt isolation level, ReadView remains unchanged:

The judgment in this step is the same as that in the previous step, and finally returns AA versions. Therefore, after multiple reads in RR isolation level, updates submitted by other transactions cannot be seen, avoiding the problem of unrepeatable reads.

5, T5

Then a new transaction (trx_id=175) updates the data ID=1, updates BB to CC, inserts the data ID=2, and commits.

At this point, the data for balance=0 is queried in transaction A.

  • inRCAt the isolation level, a new ReadView is generated:

Select min_trx_id(175) from min_trx_id(170) and max_trx_id(200) from m_IDS (170,180); select CC from min_trx_id(175) and max_trx_id(200). When the row ID=2 is queried, the same process will return the MM version. The final query returned two pieces of data, while the original query returned only one, so multiple reads at the RC isolation level can cause phantom reading problems.

  • inRRAt isolation level, ReadView remains unchanged:

When you query the row ID=1, you will eventually find the version AA along the version chain. If ID=2, trx_id(175) > max_trx_id(150), ID=2, trx_id(175) > max_trx_id(150) The final query returns only one piece of data. Therefore, multiple reads at the RR isolation level do not cause phantom read problems.

6, T6

Then a new transaction (trx_id=205) deletes the data with ID=1, but does not actually delete it, just marks delete_mask as 1.

Then the data with ID=1 is queried again in transaction A.

Since the delete_mask in the row is marked with 1, it cannot be queried, so it can only be queried along the previous version chain. The subsequent matching process is similar to the previous description and will not be repeated. At the RC isolation level, trx_id=175 is returned with the CC version. At the RR isolation level, trx_id=100 is returned, and the value is AA.

MVCC summary

As you can see from the above example, MVCC is a set of concurrent read mechanisms implemented through the undo log version chain + ReadView.

At the READ COMMITTD isolation level, each query generates a new ReadView and cannot READ changes that have not been committed by other transactions, thus solving the problem of dirty reads. However, if you can read the changes that have been committed by other transactions, there will be problems of unrepeatable read and phantom read.

At the REPEATABLE READ isolation level, a ReadView is generated only for the first query and is reused for all subsequent queries. Other uncommitted, committed, and newly inserted changes cannot be read, thus solving the problem of dirty, unrepeatable, and phantom reads.

As mentioned in the undo log article, executing a DELETE statement or an UPDATE statement that updates the primary key does not immediately DELETE the record completely from the page. Instead, it sets delete_mask to 1 and marks the record for deletion. It becomes clear why, this is primarily for MVCC, because there may be other concurrently running transactions that read the version visible to the current transaction through the version chain.

Snapshot read and current read

It is important to note that the queries in the previous examples are simple SELECT queries that read a snapshot version of the undo version chain, which can be called snapshot reads or consistent unlocked reads. Because it is a read snapshot, phantom reads can be avoided in RR isolation.

However, if it is an INSERT, DELETE, or UPDATE statement, such as the following SQL statement, the UPDATE statement will UPDATE the balance=0 record. This method is called current read and reads the latest data. If the current read can read changes committed by other transactions, phantom reading problems may occur.

UPDATE account SET balance=100 WHERE balance = 0;
Copy the code

For example, if the SQL is executed in the following sequence at the default RR isolation level, the two common query results of Session B are the same without phantom read problems. This is because Session B’s second query reads the snapshot version, that is, snapshot reads, and does not read the changes committed by other transactions.

Timeline Session A Session B Session C
t1 TUNCATE TABLE account;

INSERT INTO account(card) VALUES (‘AA’);
t2 BEGIN; BEGIN;
t3 SELECT * FROM account WHERE balance=0;

(Return the record AA)
t4 INSERT INTO account(card) VALUES (‘BB’);
t5 COMMIT;
t6 SELECT * FROM account WHERE balance=0;

(Return the record AA)
u7 COMMIT;

Session B updates the balance=0 data (AA) and balance=0 data (BB). This is where the problem of illusory reading arises. This is because the middle update is currently read, and the query at the time of the update can read updates committed by other transactions, and MVCC cannot solve this problem.

Timeline Session A Session B Session C
t1 TUNCATE TABLE account;

INSERT INTO account(card) VALUES (‘AA’);
t2 BEGIN; BEGIN;
t3 SELECT * FROM account WHERE balance=0;

(Return the record AA)
t4 INSERT INTO account(card) VALUES(‘BB’)
t5 COMMIT;
t6 UPDATE account SET balance=100 WHERE balance=0;

(You will see that two rows of data are updated)
t7 SELECT * FROM account WHERE balance=100;

(Return AA and BB records)

So how to solve the current reading problem? This is where the locks introduced in the next article come in.