Daily sentence

There is only one heroism in the world, that is to love life after recognizing the nature of life. — Romain Rolland


If the profile

As we all know, the database itself operates the files of the underlying file system and the data structure of the memory. When there are multiple requests for concurrent modification or reading of relevant data information, there will be data consistency problems and data security problems. MySQL database engine mainly relies on MVCC to solve most data access consistency problems.


Introduction to MVCC concepts

MVCC is multi-version concurrency control: it is mainly to solve the consistency problem of concurrent transaction read and write in the database. It is a kind of implementation method of optimistic lock, which can achieve read without lock, read and write without conflict, and solve read-write conflict without lock concurrency control. So how is transaction isolation implemented when multiple transactions are executed concurrently?


How does MySQL’s default RR isolation level avoid unrepeatable reads? Let’s analyze it.

Three database concurrency scenarios

  • Read-read: There are no problems and no concurrency control is required
  • Read-write: Has thread safety issues, may cause transaction isolation issues, may encounter dirty reads, unreal reads, unrepeatable reads
  • Write – write: there may be a thread safety problem, update loss problem, dirty write operation processing.

MVCC only works at Read Commited and Repeatable Read isolation levels.Copy the code

Snapshot read and current read

The snapshot to read:

Simple select operation, no lock, based on MVCC and undo log to achieve, read the visible version of the record (possibly the historical version).

The current reading:

A special read operation, which requires a lock, is the implementation of pessimistic locking. It reads the latest version of the record, and all the records returned by the current read are locked to ensure that other transactions will not modify the record concurrently.


insert/update/delete
select.for update
select. lockin share mode

Copy the code

The underlying implementation of MVCC

Database implementation, is to solve Read and write conflict, its implementation principle is mainly dependent on the record of three implicit fields, undo log, delete_sit, Read View to achieve.

The InnoDB MVCC implementation is based on undo Log and builds the required version record by rolling back the pointer. Use ReadView to determine which versions of data are visible. Meanwhile, the Purge thread uses ReadView to clean up old version data. (Insert is not required because there is no previous version and visibility is only considered for this thread)

Three implicit fields

DB_TRX_ID

6byte, last modified (modified/inserted) transaction ID: Records the ID of the transaction in which the record was created or last modified

DB_ROLL_PTR

7byte: a rollback pointer used with the undo log to point to the previous version of the record

DB_ROW_ID

6byte, implied increment ID (hidden primary key). If the table does not have a primary key, InnoDB automatically generates a cluster index with DB_ROW_ID

Note:

In fact, each record header has a special bit (deleted Flag) indicating whether the current record has been deleted.

undo log

The basic concept

The Undo log records logical changes of data. To roll back previous operations when errors occur, you need to record all previous operations and roll back when errors occur.

role

For the rollback of transactions

  • Undo logs are used to store values before data modification. If the modification is abnormal, undo logs can be used to roll back data to ensure transaction consistency.
  • Undo logging, which only logically restores the database to its original state, actually does the opposite when rolled back

Used for MVCC

The undo log types are insert undo log and update undo log

Insert undolog into undolog

Insert undo log refers to the undo log generated during the INSERT operation, because the records of the INSERT operation are visible only to the transaction itself and not to other transactions. Therefore, the undo log can be deleted directly after the transaction is committed without the purge operation.

Update undolog file

Update undo log Records undo logs generated for delete and update operations. The undo log may need to provide MVCC mechanism, so it cannot be deleted when a transaction commits. Commit to the Undo log linked list and wait for the Purge thread to make the final deletion.

Specific working principle needs to be discussed in the following cases

Update the primary key

Neither a clustered index nor a secondary index can be in place updated, resulting in two versions

Update is performed in two steps, deleting the row and then inserting the target row

Update non-primary keys

The cluster index can be updated in place. The secondary index generates two versions. The cluster index records the undo log, while the secondary index does not record the undo log

Clearing the undo log

Deletes rows in the page that contain the Delete_Bit identifier. In InnoDB, a Delete operation in a transaction does not actually Delete a row, but rather is a Delete Mark operation that marks the deletion on the record. The actual deletion would need to be done by the purge thread in the background.

Read View

What is Read View

A Read View is a Read View produced when a transaction performs a snapshot Read. At the moment of snapshot Read, a snapshot of the current database system is generated to record and maintain the ID of the current active transaction in the system. (When each transaction is started, an ID is assigned, which is incrementing. So the latest transaction, the larger the ID value)

role

The main purpose of Read View is to make a visibility judgment, that is, when a transaction performs a snapshot Read, create a Read View of the record, which is used to determine which version of the data can be seen by the current transaction. It could also be a version of the undo log recorded in the row.

Core Algorithm (Visibility algorithm)

Three properties of the Read View

  • Trx_ids: A list of values used to maintain Read View generation of transaction ids that are active on the system at the time
  • Up_limit_id: records the smallest transaction ID in the TRx_IDS list
  • Low_limit_id: Indicates the next transaction ID that has not been assigned by the system at the time when ReadView is generated. That is, the maximum value of the existing transaction ID +1

The flow of visibility judgment

Walk through DB_TRX_ID and perform the following steps until you find the most recent data visible to the current transaction:Copy the code

If the current DB_TRX_ID record does not meet the visibility of the current transaction, you can use the DB_ROLL_PTR rollback pointer to retrieve the previous DB_TRX_ID from undo log

Step1: compare the value DB_TRX_ID to up_limit_id

If less than, the current transaction will see the record where the DB_TRX_ID resides. That is, the record is visible to the latest record if greater than or equal to go to step2.

Step2 check whether the value DB_TRX_ID is greater than or equal to low_limit_id

If the value is greater than or equal to the value of the DB_TRX_ID record, it will not be visible in the current transaction. If the value is greater than or equal to the value of the DB_TRX_ID record, go to step3

Step3: Check whether DB_TRX_ID is in active transaction

DB_TRX_ID specifies the time when the current transaction’s Read View is generated. The DB_TRX_ID transaction is active and has not yet been committed. The data modified by the DB_TRX_ID transaction is invisible to the current transaction.


If not, the DB_TRX_ID transaction was committed before the current transaction’s Read View was generated. The result of the DB_TRX_ID transaction modification is visible to the current transaction

How MVCC works

Workflow for MVCC queries

Query the primary key index

  1. Generate Read View Read View
  2. DB_TRX_ID and Read View are used to determine the visibility of the primary key
  3. Work with the DB_ROLL_PTR rollback pointer and undo log to find the data record visible to the current transaction

Querying secondary indexes

  1. Generate Read View Read View
  2. Compare the up_limit_id and MAX_TRX_ID of the read view
  3. If MAX_TRX_ID is less than the up_limit_id of this Read View, all valid records are visible and valid records are filtered. Otherwise, visibility cannot be determined by the secondary index. You need to iterate over each record to reverse check the cluster index record and determine visibility by the cluster index record

MVCC and isolation level

MVCC only works at Read Commited and Repeatable Read isolation levels.Copy the code
  • At the RC isolation level, each snapshot Read generates and gets the latest Read View, which is why we can see updates committed by other transactions in RC level transactions

  • In RR isolation, a Read View is created only for the first snapshot Read in the same transaction, and all subsequent snapshot reads obtain the same Read View.

At RR level, when a snapshot Read generates a Read View, the Read View takes snapshots of all other active transactions at that time. The changes made to these transactions are not visible to the current transaction. Changes made by transactions created before the Read View are visible.