MySQL Advanced series: Implementation of multi-version concurrency control MVCC

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Mysql is the most widely used relational database on the Internet at present. InnoDB is the default storage engine of mysql and the most used storage engine, which can meet most of the business requirements. Among them, the advantage of high concurrency is realized through MVCC. This article introduces how MVCC supports concurrency.

Multi-version concurrency control

Concurrency Control (MVCC) is a multi-version Concurrency Control approach that allows for concurrent access to a database in a database management system and transaction memory in a programming language.

InnoDB in mysql implements MVCC mainly in order to improve the database concurrency performance, in the case of no lock can also handle read and write concurrency, greatly improve the database concurrency.

First we have a table with the following business fields

Mysql > select * from ajisun; mysql> select * from ajisun; + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | id name | city | + -- -- -- -- -- - + -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- + | 100 | ajisun | | Shanghai + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set (0.00 SEC)Copy the code

Let’s explain MVCC through the following key words

Current read, snapshot read, three hidden fields, undo log, Read View, version chain, RR and RC isolate basic differences

Current read, snapshot read

The current reading

The data obtained by the current read is the latest data and cannot be modified during the read, so the read records are controlled by locking. The following

select * from ajisun where id > 1 lock in share mode; Select * from ajisun where id >1 for update;Copy the code

Read the snapshot

A simple SELECT query is a snapshot read, which is unlocked and non-blocking, reducing the overhead of the database.

However, it does not make sense for snapshot reads to be serialized at the isolation level, because serialized SQL is queued, there is no concurrency, and therefore becomes current reads.

Snapshot data reading, as the name implies, reads a snapshot data. Therefore, the read data may be historical data rather than the latest data.

The implementation principle of MVCC mainly relies on three hidden fields, undo log and Read View

Two, hide the field

In addition to our normal business fields, InnoDB internally adds three hidden fields to the database table:

DB_TRX_ID: specifies the transaction ID of a 6-byte database. The transaction ID of the last transaction to insert or update a row
DB_ROLL_PTR: specifies the 7-byte rollback pointer. This is used in undo log to point to the previous version of the corresponding row.
DB_ROW_ID: specifies the hidden primary key of 6-byte. InnoDB automatically generates monotonically increasing hidden primary keys if there are no primary keys in the table (tables with primary keys or non-null UNIQUE keys do not contain DB_ROW_ID columns).

The above table does not design the primary key, where id/name/city is our business field, then add the hidden field should be as follows

Three, undo log and version chain

Undo log is a rollback log. Records generated during INSERT /update/ DELETE changes are easily rolled back.

When an INSERT operation is performed, the undo log generated is required only when the transaction is rolled back, otherwise it will be deleted after the transaction commits.

When update and DELETE are performed, the undo logs generated are not only required for transaction rollback, but also for snapshot reads, so they are not deleted immediately. Only when the log is no longer used will the mysql Purge thread purge the log (the delete operation is marked as a deletion, not a real deletion).

Versioning is when multiple transactions operate on the same record and generate an Undo log. These undo logs are connected by a rollback pointer.

For example, now a transaction ID10 inserts a record into the table as follows

Now there is a second transaction whose id=20 to change the name of this record to “Mr. Ji”. This transaction will lock this record in the database and copy this record to the undo log. After the copy is complete, change the name of this record to “Mr. Ji”. At the same time, change the transaction ID of the hidden field to the ID of the current transaction 20. The rollback pointer points to the position just copied in the undo log to commit the transaction and release the lock.

All copies of the transaction are placed at the head of the Undo log table, thus forming a version chain (the head node of the version chain is the latest value of the current record) as shown below

Four, ReadView

ReadView is a data ReadView generated when a transaction snapshot is read. At the moment when a transaction snapshot is read, a snapshot of the data system is generated and the id of the current active transaction is recorded and maintained. The value of the transaction id is increasing.

The most useful use of ReadView is to determine the visibility of data. When a transaction performs a snapshot read, a ReadView of the record is created to determine which historical data in the version chain that the transaction can see based on certain conditions throughout the transaction.

Visibility is determined by four global attributes, which are:

M_ids: represents the list of transaction ids active in the system at the time ReadView was generated.

M_low_limit_id: indicates the transaction ID that should be assigned to the next transaction when ReadView is generated.

M_up_limit_id: indicates the minimum transaction ID of the active read/write transaction in the system when the ReadView is generated, that is, the minimum value in m_IDS.

M_creator_trx_id: indicates the transaction ID of the transaction that generated this ReadView

These four fields are used to determine whether a version of a record is visible when accessing it:

If the version transaction ID of the accessed record is the same as m_CREator_trx_ID value in ReadView, then the current transaction is accessing the modified record and that version is visible to the current transaction.
If the transaction ID of the accessed version is less than the value m_up_limit_ID in ReadView, then the transaction that generated the version was committed before the current transaction generated the ReadView, so the version can be accessed by the current transaction.
If the transaction ID of the accessed version is greater than the m_LOW_limit_ID value in ReadView, then the transaction that generated this version was started after the current transaction generated the ReadView, so this version cannot be accessed by the current transaction.
If the transaction ID of the accessed version is between m_up_limit_ID and m_LOW_limit_id of the ReadView, it is necessary to check whether the transaction ID of the accessed version is in the m_IDS list. If yes, the transaction ID of this version was active when the ReadView was created. The version is not accessible; If not, the transaction that generated the version of the ReadView when it was created has been committed and the version can be accessed.

If a version is not visible to the current transaction, then follow the version chain to the next version record, and continue the comparison rules until the last version in the version chain is found. If the last version is not visible, then the record is not visible to the transaction at all, and the record is not found.

This is the comparison rule for MVCC visibility to versions

What is the difference between RC and RR isolation levels?

READ-COMMITTED(RC) and REPEATABLE-READ(RR) readViews are different because of the different generation timing

At the Read-uncommitted isolation level, it is possible to READ data that has not been committed by another transaction, just READ the latest one, there is no snapshot READ ReadView.

Under SERIALIZABLE isolation level, all SQL is serialized by locking, and the latest read is also available. No snapshot read ReadView exists.

The first snapshot read of a transaction in an RR creates a ReadView with four properties recorded in the ReadView, including the active transaction list. After that, the same ReadView is used for snapshot reads. Does not regenerate, so as long as the current transaction uses snapshot reads before other transactions commit updates, subsequent snapshot reads will use the same Read View, so subsequent changes will not be visible.

At the RC level, each snapshot read creates a new snapshot and ReadView, which is why we can see updates committed by other transactions in RC level transactions.

Six,

Interviewer: Do you know MVCC in mysql? Tell me how to implement it?

MVCC is multi-version concurrency control, through the generation of historical versions of records to solve the phantom read problem, and improve the performance of the database, no lock to achieve concurrent read and write operations.
MVCC is implemented mainly through three hidden fields, undo log and readView.
The three hidden fields are hidden primary key, transaction ID, and rollback pointer.
The undo log is a history record generated when each transaction modifies the same record, facilitating rollback and generating a version chain.
ReadView is a snapshot of the record generated when a transaction is performing a snapshot read to determine the visibility of the data.
ReadView Visibility judgment rules.

There’s nothing wrong with that description.

How mysql Works, High Performance mysql

Welcome to the comments section to communicate

MySQL advanced related more content, such as lock, MVCC, read and write separation, sub-library sub-table, etc., is still in the update, if you want to know the content can also give me a message, welcome to pay attention to prompt more.

I am Mr. Ji, with the output force input and continue to learn, continue to share the series of technical articles, as well as the whole network worth collecting good articles, welcome to pay attention to the public number, do a continuous growth of technical people.

Historical articles in the MySQL series

1. MySQL Advanced Series: Learn about the MySQL infrastructure

2. MySQL Advanced series: Learn about MySQL storage engine.

3. MySQL advanced series: What is the difference between MySQL MyISAM and InnoDB?

4. MySQL Advanced series: how to better select the data type in MySQL table design;

5. MySQL Advanced series: How to use paradigm in database design;

6. MySQL advanced series: explain the meaning of each field;

MySQL advanced series: why MySQL uses B+ as its index data structure;

MySQL advanced series: You need to know the basics of indexing;

MySQL > create index ();

MySQL advanced series: master/slave replication principle and configuration

11. MySQL Advanced series: join principle -3 algorithms;

MySQL advanced series: transaction and transaction isolation levels;

MySQL Advanced series: Implementation of multi-version concurrency control MVCC

Current read, snapshot read

The current reading

Read the snapshot

Two, hide the field

Three, undo log and version chain

Four, ReadView

What is the difference between RC and RR isolation levels?

Six,

Related Posts

RocketMQ series iii – Advanced features and source analysis

Enter factory must ask: you do not know Java “lock” matter

Redis String Underlying data structure