Writing in the front

  1. What are the problems caused by the lack of isolation of transactions?
  2. What is a transaction isolation level and why is it needed?
  3. What is MVCC and what problems does it solve?
  4. How is MVCC implemented?

background

As a basic service, mysql usually has multiple application services interacting with it as clients at the same time. Each application service executing multiple statements in a transaction does not want to be affected by or affect transactions of other clients

The simplest approach is for all transactions to execute exclusively, so that when all statements of one transaction have been executed (commit or rollback), statements of other transactions can be executed. But the impact on performance is too great. In fact, such isolation is not required in most business scenarios, so some of it can be sacrificed in exchange for performance of database services

Problems caused by isolation

What problems might arise if there is a lack of isolation between different transactions?

Let’s first define what is no problem: is the result of executing a transaction the same as executing each transaction sequentially

Dirty write

Dirty writes, in which one transaction makes changes to uncommitted change records of another transaction

For example: two transactions A and B, when transaction B updates A record but has not committed it, transaction A updates the record and commits it. If transaction B rolls back later, the record will be in the same state as before transaction B’s update. Not only B’s update will be rolled back, but transaction A’s update will also be rolled back

What’s wrong with that?

  • This is coupled with fully guaranteed transaction isolation as a result of serial executionDoes not conform to, if the serial execution, no matter which A or B executes first, in the case of B rollback, the final result of the record must be the result after A modification, rather than the result before B update
  • Secondly, transaction A was obviously modified after the modification of transaction B, and finally returned to the previous modification of transaction B, as if the update had been lost, which did not meet most business expectations

Fortunately, Mysql Innodb avoids dirty writes at all transaction isolation levels

Dirty read

Dirty reads, when one transaction reads uncommitted changes from another transaction

For example: two transactions A and B, when transaction B modifies A piece of data before committing it, transaction A reads the record and performs some operations based on the record. If transaction B rolls back later and the modified record of TRANSACTION B is restored to its original state, the previous read of the record of transaction A is A dirty read

What’s wrong with that?

  • As with dirty writes, if transactions A and B are executed sequentially, there must be no dirty reads, and the result of this operation is not the same as that of A fully isolated operationDon't agree
  • If transaction B needs to be rolled back later, the modified record also needs to be rolled back. This modified value should never have existed in the first place, and it can have undesirable consequences if transaction A relies on data that should not exist for subsequent operations. That isWrong cause leads to wrong effect

Unrepeatable read

Non-repeatable reads, that is, a transaction read to a record, are not the same every time

For example, transaction A reads A record X at the beginning, and the result is M. At this time, another transaction B modifies record X, changing the value of record M to N and committing. Transaction A does not commit, then reads record X, and finds that the value of X changes to N

In this case, unrepeatable reads can only read the data that has been submitted, avoiding dirty reads

So what’s wrong with that?

  • It’s ok to say that there is no problem, after all, reading data isHave submittedThe,There is no problem with dirty reading, but the value of a record is queried multiple times within a transaction
  • If there is a problem, there is a problem. Again, this is the same as complete serial execution, which guarantees complete isolationNot the same. If it is fully serial, when one transaction starts any operation, no other transaction will intersperse it, and no other transaction will modify any records. Of course, the original transaction will read the same data every time. This is problematic if the user expects the same result to be read every time in a transaction

Phantom read

Phantom read, that is, one transaction queries some records according to a certain condition, and then another transaction inserts some records that serve this condition. When the original transaction queries again according to the original condition, the inserted records will also be read out. This phenomenon is called phantom read

Although repeatable for previously read records, there is no guarantee for newly inserted records

A small summary

From dirty writes to dirty reads to unrepeatable reads to phantom reads, the severity of each isolation problem decreases

In fact, every kind of isolation problem you can say is problematic, because if you compare it to complete isolation, none of these isolation problems more or less fit the definition of complete isolation

But there’s nothing wrong with unrepeatable reads, for example: if the user expects to read the latest committed data every time in a transaction, then there’s nothing wrong with unrepeatable reads

Therefore, databases typically support custom isolation levels that are tailored to your needs and strike a balance between performance and isolation

The isolation level of the transaction

Four isolation levels are defined in the SQL standard. Each isolation level and its possible isolation problems are as follows:

Isolation level Dirty read Unrepeatable read The phantom read
Uncommitted read May occur May occur May occur
Has been submitted to read May occur May occur
Repeatable read May occur
serializable

​​

Mysql can prevent phantom reads at repeatable read isolation level

MVCC

The uncommitted read isolation level causes all isolation issues except dirty reads and is simple to implement by reading the most recent value of a record every time, regardless of whether or not the transaction that produced the value is committed

Serializable, which can be implemented to execute each transaction serially, is also simpler

So how does committed read and repeatable read isolation levels prevent dirty reads and avoid dirty and repeatable read problems?

Mysql Innodb uses multi-Version Concurrency Control (MVCC) to implement the Concurrency Control. When reading a record that is being written by another transaction, the client should read the historical Version of the record to support concurrent read-write operations and improve system performance

What is the historical version?

Mysql Innodb has two hidden columns for each clustered index record:

  • Trx_id: Each time a transaction changes a cluster index record (insert, update, delete), the transaction ID of that transaction is assigned totrx_idHidden columns

Each transaction in Innodb is assigned a transaction id in ascending order. For example, there are currently two transaction ids 1,2. So the next time a new transaction starts its transaction ID will be 3

  • Roll _pointer: every time a change is made to a cluster index record, the old value of the record is not discarded. Instead, the old version of the record is written toThe undo logThe hidden column then acts as a pointer to find the recordInformation before modification

The record was modified in chronological order as follows:

  1. The transaction of 100Insert anameforjamesThe record of
  2. The transaction of 101Will record thenameThe column fromjamesUpdated tojerry
  3. The transaction of 102Will record thenameThe column fromjerryInstead oftom

In order to supportMVCCThese historical versions are retained until the system judgesNo longer neededThese historical versions so far

Avoid dirty read

The core of avoiding dirty reads is to ensure that each version of a record read belongs to a committed transaction

Innodb designs a structure called ReadView that contains four fields

  • m_ids: A list of transaction ids active in the system at the start of a read operation
  • min_trx_id:m_idsThe smallest value in
  • max_trx_id: The ID that the system should assign to the next transaction when a read operation begins
  • creator_trx_id: indicates the id of the transaction to which the read operation belongs

With ReadView, we can determine which version of a record should be read when reading it, that is, which version is visible to the current transaction:

  • If the trx_id of a version of a record is the same as the creator_trx_id of the current ReadView, that version was created by the current transaction

  • If the trx_id of a version of a record is less than the min_trx_id of the current ReadView:

    • First, the transaction ID to which the version belongs is absentm_ids, the transaction is inactive and either committed or has not yet started
    • whilem_idsIs greater than the transaction ID of this versiontrx_id, and the transaction ID is incrementally allocated, indicating that it cannot have been started, it must have been committed, and the committed version is readable
  • If the trx_id of a version of a record is greater than the max_trx_id of the current ReadView:

    • Note The version is inReadViewThis version cannot be accessed by the current transaction
  • Otherwise, only one case remains: if the trx_id of a certain version of a record is between [min_trx_id,max_trx_id-1], you need to check whether the trx_id is in m_IDS

    • If yes, it indicates that the version of the record is currently active, i.e., created by an uncommitted transaction. Since it is not committed, it is not visible
    • If not, it indicates that the version of the record was generated by a committed transaction and has already been committed

Note that transactions in the interval [m_ids maximum, max_trxid-1] may also be visible, such as max_trx_id=5, transaction 3, and transaction 4 committed when there are active transactions 1, 2, 3,4 in the system

M_ids =[1, 2], max_trxid-1 is [3,4], 3,4 was committed before the current ReadView was created

Each read starts from the latest version of the target record. If a version is not visible, search down the record version chain until it is foundSome visible versionSo far. If all versions of the record are not visible, the record is being created in the current transactionReadViewIf it does not already exist, return null

Before each read, create a ReadView according to the current active or committed status of all transactions in the system. According to the ReadView, determine which version of the target record is readable to avoid dirty reads, because the read version must have been committed, and all the submitted records in the system can be read at the moment. The committed read isolation level is implemented in this manner

Avoid unrepeatable reads

Let’s look at how to avoid being unrepeatable:

Unlike the dirty read avoidance operation, a ReadView is created on the first read in the transaction, and subsequent reads in the transaction use this ReadView to determine whether a version of the record to be read is visible

When a ReadView is generated for the first time, it is equivalent to taking a snapshot of the transaction execution status in the current system. The subsequent reads in the transaction are based on this snapshot to determine whether the version of a record is visible. Of course, each read result is the same

conclusion

  • The lack of isolation of transactions can causeDirty write, dirty read, unrepeatable read, phantom readThe isolation problem of
  • The database allows users to set transaction isolation levels, a direct tradeoff between isolation and performance
  • Mysql Innodb throughMVCCMechanism to implementHas been submitted to readandRepeatable readIsolation level