Hello everyone, I am Xing Huanjie from JDL Jingdong Logistics Technology Development Department, today to share a jingdong interview real question, this is also a question I listened to my station next to gao T (high, is really high) interview candidates asked, he asked, can you talk about MySQL affairs? Does MVCC know anything about it? Without further ado, this article will take a deep look at MySQL transactions and the implementation principle of MVCC.

Transaction definition and four major features

  • What is a transaction?

  • Four characteristics of transactions (ACID) :

  • Atomicity: A transaction is an indivisible unit of work in which all or none of the operations involved are performed.

  • Consistency: The transaction must change the database from one consistent state to another. Consistency is closely related to atomicity.

  • Isolation: The execution of a transaction cannot be interrupted by other transactions. That is, the operations and data used within a transaction are isolated from other concurrent transactions, and the concurrent transactions cannot interfere with each other.

  • Durability: Refers to the fact that once a transaction is committed, its changes to data in the database should be permanent and should not be affected by subsequent operations or failures.

Common problems in transactions

  • Dirty reads: Is an A transaction without even submitting it to the data change can also be seen by other transactions B transaction, transaction B read A transaction has not been submitted data and the data may be wrong, could A don’t want to submit the data, it is just A transaction to modify data in the process of an intermediate data, but the transaction B read, this behavior is called dirty reads, This data is called dirty data
  • Non-repeatable read: in A transaction, the same data is read for many times, but in the process of reading, B transaction modifies the data, causing the data to change, so when A transaction reads the data again, the data is different from the first one, which is called non-repeatable read
  • Phantom Read: A transaction query the database for many times, the results showed that the query data A number of different, the interval of A transaction multiple queries, B transactions and write some multiple items conform to the query conditions (can write here is the update, insert, delete), A transaction to check again, just like the illusion, how suddenly changed so much, This phenomenon is called phantasm

Isolation level – The cause of the problem

Multiple transactions affect each other and are not well isolated, which is the problem of Isolation among the four features of transactions mentioned just now. The Isolation level of transactions is not set properly. Let’s look at the several Isolation levels of transactions

  • Isolation level
  • Read Uncommitted RU: When a transaction has not committed, its changes can be seen by other transactions
  • Read Committed RC: After a transaction commits, its changes are seen by other transactions.
  • Repeatable read RR: The data seen during the execution of a transaction is always the same as the data seen when the transaction is started. Of course, at the repeatable read isolation level, uncommitted changes are also invisible to other transactions.
  • Serializable: As the name implies, for the same row, “write” will add “write lock”, “read” will add “read lock”. When a read-write lock conflict occurs, the last accessed transaction must wait for the previous transaction to complete before execution can continue.

Let’s take a look at an example to see how these four isolation levels relate to the problem of dirty, unrepeatable, and phantom reads

Let’s discuss the different values of V1,V2, and V3 for transactions at different isolation levels

  • Read Uncommitted (RU) : TRANSACTION A can read the value changed by transaction B, even if transaction B is not committed. So V1 is going to be 200
  • V1 : 200
  • V2 : 200
  • V3 : 200
  • Read Commit (RC) : If transaction B is not committed, transaction A cannot see the value changed by transaction B, only after the commit
  • V1 : 100
  • V2 : 200
  • V3 : 200
  • Repeatable read (RR) : A transaction reads data multiple times, and the total data is the same as the first read.
  • V1 : 100
  • V2 : 100
  • V3 : 200
  • Serialization (S) : When transaction A is executed, transaction B will be locked, and transaction B can continue to execute after transaction A completes
  • V1 : 100
  • V2 : 100
  • V3 : 200

MVCC principle

Multi-version Concurrency Control (MVCC) is a way for a database to Control concurrent access.

  • Note in particular that MVCC is only effective at two transaction isolation levels: read Committed (RC) and repeatability (RR)
  • InnoDB is database engine (InnoDB) level implementation, used to deal with read and write conflict means (without locking), improve access performance

How is MVCC implemented? It relies on version chains and consistency views

1. The version of the chain

  • A version chain is a linked list of changes to each piece of data

So how is this version chain formed, and how is each piece of data linked together?

In fact, for tables in the InnoDB storage engine, its clustered index record contains two hidden fields

  • Trx_id: specifies the transaction ID that stores changes to this table. A unique transaction ID is assigned only when a change operation (UPDATE, DELETE, INSERT) occurs. The transaction ID is incremented
  • Roll_pointer: pointer to the last modified record
  • Row_id (optional): This field is not included when there is a primary key or a unique key that cannot be null

Suppose the current database has a piece of data like this, inserted by a transaction with transaction ID 100, then the structure of this piece of data is as follows

Later, transaction 200 and transaction 300, respectively, modify this data

So the version chain is as follows

Every time we change the data, an undo log is inserted and the roll_pointer for the record points to the previous record, as shown in the figure

  1. The first data is Xiaojie and the transaction ID is 100

  2. The transaction with transaction ID 200 changes its name from Jacky to A

  3. The transaction with transaction ID 200 changes its name from A to B

  4. The transaction with transaction ID 300 changes its name from B to C again

So the linked list is C -> B -> A -> Jay (from latest data to oldest data).

2. Consistency View (ReadView)

You need to determine which version in the version chain is visible to the current transaction, hence the concept of a consistent view. There are four attributes that are important

  • M_ids: A list of transaction ids of the currently active read/write transactions when the ReadView was generated
  • Min_trx_id: minimum value of m_IDS
  • Max_trx_id: the maximum value of m_IDS +1
  • Creator_trx_id: indicates the id of the transaction that generates the transaction. If a transaction is simply enabled, there is no transaction ID. The default value is 0.

Whether the current version in the version chain is visible to the current transaction depends on these four attributes in the following cases

  • If trx_id = creator_trx_id, the current transaction can see its modified data.
  • Trx_id < min_trx_id: The transaction that generated this data was committed before the readView was generated, visible
  • If trx_id >= max_trx_id: the transaction that generated this data was started after the ReadView was generated
  • When min_trx_id <= trx_id < max_trx_id
  • Trx_id in m_IDS list: active transaction not committed at time of ReadView generation, not visible
  • Trx_id is not in m_IDS list: the transaction was committed before the readView was generated, visible

If one version of data is not visible to the current transaction, move up the version chain to the next version, and so on.

Note: RR and RC generate consistent views at different times (this is the main difference between the two isolation level implementations)

  • Read Committed RC generates a ReadView on each select
  • Repeatable Read RR generates the ReadView on the first select

Let’s put it in practice.

RR with RC and MVCC examples of combat

Let’s use this example to analyze the consistency view and version chain (MVCC) of reading data when the database isolation level is RC and RR, assuming we have multiple transactions as follows.

  • Suppose there is an initial data entry in the database with the name Java to add, id 1 (id, name,trx_id,roll_point), and transaction ID 1 to insert this data
  • Creator_trx_id = 0; creator_trx_id = 0; creator_trx_id = 0; creator_trx_id = 0; creator_trx_id = 0;
  • In the following examples, A,B, and C mean to change the name to A,B, and C. Read also reads the name of the current moment. By default, all transactions are enabled, and the transaction has undergone some operation to produce the transaction ID

Read submitted (RC) with MVCC

  • After a transaction commits, its changes are seen by other transactions

The ReadView(consistency view) is regenerated each time it is read

  1. At time T1, transaction 100 changes its name to A

  2. At time T2, transaction 100 changes its name to B

  3. When time T3, transaction 200 changes its name to C

  4. At time T4, transaction 300 begins to read the name

  • The version chain of this data is as follows

The same color represents operations within the same transaction

  • Let’s calm down and have a good analysis at this time T4 time transaction 300 to read, what data will read?

The current most recent data is C, which was modified by transaction 200. Remember the properties of the consistency view we talked about earlier, and the rules that determine whether this data can be read by the current transaction. So let’s look at this example.

At this point (generate the consistency view ReadView)

  • M_ids is [100,200] : a list of transaction ids of currently active read/write transactions
  • Min_trx_id is the minimum value of 100: m_IDS
  • Max_trx_id is 201: the maximum value of m_IDS +1

Trx_id = min_trx_id<=trx_id

After analyzing the first reading, let’s move on

  1. At time T5, transaction 100 commits
  2. At time T6, transaction 300 changes its name to D
  3. At time T7, transaction 400 reads the current data
  • The version chain of this data is as follows

At this point (rebuild the consistency view ReadView)

  • M_ids is [200,300] : a list of transaction ids of currently active read/write transactions
  • Min_trx_id is the minimum value of 200: m_IDS
  • Max_trx_id is 301: the maximum value of m_IDS +1

Min_trx_id <=trx_id

After analyzing the second read, we move on

  1. At time T8, transaction 200 changes its name to E
  2. At time T9, transaction 200 commits
  3. At time T10, transaction 300 reads the current data
  • The version chain of this data is as follows

At this point (rebuild the consistency view ReadView)

  • M_ids is [300] : a list of transaction ids of currently active read/write transactions
  • Min_trx_id is the minimum value of 300: m_IDS
  • Max_trx_id is 301: the maximum value of m_IDS +1

The current transaction id is 200,200

When the isolation level is read committed RC, the consistency view is regenerated for each read.

  • The data read by transaction 300 at T4 time is Xiaojie
  • Transaction 400 at time T7 reads data B
  • The data read by transaction 300 at time T10 is E

Repeatable reading (RR) and MVCC

  • The data seen during the execution of a transaction is always the same as the data seen when the transaction is started

So for transaction 300, it reads data at T4 and T10, but its consistency view is always the same as the first read view, which is the consistency view generated at T3

The version chains of RR and RC are the same, but the consistency view used to determine whether the current data is visible or not is different

At this repeatable RR isolation level,

  1. The analysis and results of transaction 300 on the first read at T4 time are the same as those of RC, which can be seen in the above analysis and results
  2. The analysis and results of transaction 400 on the first read at time T7 are the same as those of RC, which can be seen in the above analysis and results
  3. At time T10, the consistency view of transaction 300 at the second read is the same as that at the first read, so what data is read at this time needs to be re-analyzed

At this point (using ReadView, the consistency view generated on the first read)

  • M_ids is [100,200] : a list of transaction ids of currently active read/write transactions
  • Min_trx_id is the minimum value of 100: m_IDS
  • Max_trx_id is 201: the maximum value of m_IDS +1

The version chain at this point is

The transaction ID of the current data is 200, and the data is E, which is in the active list of the current transaction, so the data is not visible. According to the rollback pointer, the last version is found, and the transaction ID is 300, and the current transaction is 300, so the data read is D

  • We can think for ourselves, if there is no record of the transaction 300 change, how can we continue to analyze down?

When the isolation level is repeatable read RR, each read will use the ReadView generated on the first read.

  • The data read by transaction 300 at T4 time is Xiaojie
  • Transaction 400 at time T7 reads data B
  • The data read by transaction 300 at time T10 is D

Welcome to [JINGdong Technology] to learn about the developer community

More wonderful technical practice and exclusive dry goods analysis

Welcome to “JINGdong Technology Developer” public account