What is MVCC?

Database concurrency control — locking

Concurrency Control is a common concurrency control for database management systems.

We know lock, concurrency control commonly used is when a thread to manipulate a Shared resource, the lock is a very simple and crude method (at the beginning of the transaction to DQL read locks, give DML write lock), the lock is a pessimistic way of implementation, that is to say it can cause blockage to other transactions, which affects database performance.

Let me explain the concept of optimistic locks and pessimistic locks. I think they are mainly conceptual understanding.

  • Pessimistic lock: When a thread needs to perform operations on a shared resource, it first locks the shared resource. When this thread holds the lock on the resource, other threads will block the operation on the resource. Such as the Synchronized keyword in Java.
  • Optimistic locking: When a thread needs to operate on a shared resource, it does not lock it, but decides after the operation is complete. (such as optimistic lock will be controlled by a version number, if the operation is completed through the version number to judge whether there are other threads in this thread operation process has been operated on the Shared resources, if there is a notification operation fails, if there is no operation success), except, of course, the version number and CAS, if you don’t know can go to learn, I’m not going to do too much here.

Database concurrency control — MVCC

While many people think of MVCC as an optimistic implementation of locking, I think OF MVCC as an optimistic implementation of database concurrency control through a visibility algorithm.

Two read forms of MVCC

Before I talk about how MVCC works, I think it’s important to look at the two read forms of MVCC.

  • Snapshot read: Reads only the visible version of the current transaction without locking. Just remember that a simple select operation is a snapshot read (select * from table where id = XXX).

  • Current read: Reads the current version, such as special read operations, update/insert/delete operations

    Such as:

    Select * from table where XXX lock in share mode, select * from table where XXX for update, update table set.... insert into table (xxx,xxx) values (xxx,xxx) delete from table where id = xxxCopy the code

Realization principle of MVCC

MVCC uses “three hidden fields” to implement version concurrency control. I looked around and saw a lot of blogs about how to implement control through a create transaction ID field and a delete transaction ID field. As it turns out, this isn’t quite right, so let’s take a look at the three actual hidden columns innoDB created while MySQL was building the table.

RowID DB_TRX_ID DB_ROLL_PTR id name password
Id that is automatically created Transaction id Rollback Pointers id name password
  • RowID: Hidden increment ID. InnoDB uses this RowID to create a cluster index when no primary key is specified.
  • DB_TRX_ID: specifies the transaction ID of the record that was recently modified (updated/deleted/inserted).
  • DB_ROLL_PTR: Rollback pointer to the previous version of this record.

There is also a deleted Flag field to determine whether the record has been deleted.

While MVCC uses the transaction field, rollback pointer field, whether to delete the field. Let’s take a look at the current table (isDelete is taken by myself, officially it is in the content at the beginning of the line, it doesn’t really matter where it is, as long as you know there is one).

isDelete DB_TRX_ID DB_ROLL_PTR id name password
true/false Transaction id Rollback Pointers id name password

So how to implement the visibility algorithm of MVCC through these three fields?

Also almost something! UndoLog (rollback log) and read-view(read view).

  • UndoLog: Rollback logs of transactions, a very important part of visibility algorithms, fall into two categories.

    • Insert undo log: The undo log generated when a transaction inserts a new record, which can be discarded after the transaction is committed
    • Update undo log: The undo log generated when the transaction is updated or deleted is still required for snapshot reads. Therefore, the undo log cannot be deleted directly. The undo log can be deleted only when the system has no older read-view than this log. Ps: So long transactions will generate many old views, so undo log cannot delete a lot of storage space.
  • Read-view: a read-view is a necessary condition for creating a view at the second level in MySQL. For example, when a transaction performs a select operation (snapshot read), a read-view is created. This read-view is actually only three fields.

    • Alive_trx_list: read-view Generates the ids of active transactions in the system at the time.
    • Up_limit_id: records the minimum transaction ID in the alive_trx_list above.
    • Low_limit_id: Read-view This object indicates the maximum value of transaction ids + 1 at this time.

At this point, everything was ready except the east wind. Now LET me introduce you to the most important visibility algorithm.

How to compare the DB_TRX_ID to the three properties of the read-view when it is generated. I will talk about the three steps. If you don’t quite understand it, you can refer to my later practice to understand it.

  • Check whether the value DB_TRX_ID is less than up_limit_ID or equal to the current transaction ID. If yes, the current transaction can see the record. If it is greater than, it will enter the next round of judgment
  • Then check whether the value DB_TRX_ID is greater than or equal to low-limit-id. If it is greater than or equal to, the transaction cannot see the record, or it goes to the next round of judgment.
  • DB_TRX_ID specifies whether the record is in the active transaction array. If yes, the record is not committed and is not visible to the current transaction. If no, the record is committed and is visible.

If the record is not visible to the transaction and ROLL_PTR is not null then it points to the address of the rollback pointer and undolog is used to find the visible record version.

Below I draw a flow chart of the visibility algorithm

practice

To prepare data

First I created a very simple table, just a student table with ID and name.

id name
Student id The student’s name

At this point we will also identify the hidden column we need, and it will look like this

isDelete id name DB_TRX_ID DB_ROLL_PTR
Deleted or not Student id The student’s name Create delete the transaction ID that updates the record Rollback Pointers

Insert three rows to change the table to look like this.

isDelete id name DB_TRX_ID DB_ROLL_PTR
false 1 Xiao Ming 1 null
false 2 A small party 1 null
false 3 Xiao zhang, 1 null

The sample a

As anyone who has used MySQL knows, transaction B must be fetching data like this at this point because of isolation.

id name
1 Xiao Ming
2 A small party
3 Xiao zhang,

Why are uncommitted changes from transaction A invisible to transaction B, and how does MVCC do this? Let’s experiment with the visibility algorithm we just did.

First transaction A starts the transaction (of course this is not started, since the read-view is actually obtained in RR mode when the first snapshot read is made). We assume that transaction ID of transaction A is 2 and transaction ID of transaction B is 3.

Transaction A then performs an update operation, as shown in the figure. The update operation creates A new version and the rollback pointer of the new version points to the old version (note that undo log actually stores logical logs, so I’ll write physical logs for convenience).

Finally transaction B does a snapshot read, and note that this is the focus of our analysis.

First of all, When we create a read-view for snapshot reads, our read-view is up-limit-id = 2 alive-trx-list = [2,3] low-limit-id = 4 Then we get the two unmodified records (out of order, for the sake of explanation together) we get the two records (2, small square) and (3, small piece), DB_TRX_ID = 1 DB_TRX_ID = 1 DB_TRX_ID = 1 DB_TRX_ID = 1 DB_TRX_ID = 1 We then retrieve the changed rows of dataCopy the code

DB_TRX_ID = 2 DB_TRX_ID = = low_limit_ID DB_TRX_ID = = 2 >= 4 DB_TRX_ID is not visible because it is not committed yet. DB_TRX_ID < up-limit-id DB_TRX_ID < up-limit-id = 1 DB_TRX_ID < up-limit-id = 2Copy the code
id name
1 Xiao Ming
2 A small party
3 Xiao zhang,

To verify this, we commit transaction A, recreate transaction C and select.

This is what we expect

id name
1 Jack Bauer
2 A small party
3 Xiao zhang,

The flow chart of this operation is shown below

At this point, let’s examine the read-view generated by transaction C.

At this point transaction A has already committed, so transaction A is not in the active transaction array, and the three properties of the read-view should be

Up-limit-id = 3 alive-trx-list = [3,4] low-limit-id = 5Copy the code
  • DB_TRX_ID = 1; DB_TRX_ID = 1;
  • DB_TRX_ID = 2; DB_TRX_ID = 2; db_limit-id = 3;

So what I’m going to return is theta

id name
1 Jack Bauer
2 A small party
3 Xiao zhang,

Example 2

To deepen our understanding, let’s use a more complex example to verify the visibility algorithm.

First we delete A record in transaction A, at which point it looks like this.

Transaction B then inserts, resulting in the following.

And then transaction B does a select operation, and we can see that the whole table is actually going to look like this for the select operation to select.

In this case, read-view is

Up-limit-id = 2 alive-trx-list = [2,3,4] low-limit-id = 5Copy the code

DB_TX_ID = 1 < up-limit-id = 2 DB_TX_ID = 1 < up-limit-id = 2 DB_TX_ID = 2 < up-limit-id = 2 DB_TX_ID = 2 < up-limit-id = 2 DB_TX_ID = 2 < low-limit-id = 2 DB_TX_ID = 2; alive-trx-list = [2,3,4]; DB_TX_ID = 2; DB_TX_ID = 1 DB_TX_ID = 1

DB_TX_ID = current_TX_id (current transaction ID);

The return table looks something like this

id name
1 Xiao Ming
2 A small party
3 Xiao zhang,
4 Small bright

Then transaction A performs the select operation, and we can see that the read-view is now

Up-limit-id = 2 alive-trx-list = [2,3,4] low-limit-id = 5Copy the code

And then you see the same thing up here

DB_TX_ID = 1 < up-limit-id = 2 DB_TX_ID = 1 < up-limit-id = 2 DB_TX_ID = 2 = current_TX_id = 2 DB_TX_ID = 2 DB_TX_ID = 4 < up-limit-id = 2 DB_TX_ID = 4 >= low-limit-id = 5 Not valid the last step is not visible in the active transaction array and the rollback pointer of this record is null.

The returned list should look like this

id name
1 Xiao Ming
2 A small party

It’s a lot to analyze, but the more the better, the more you’re familiar with it, the better you’ll understand the algorithm.

After that, transaction C performs snapshot reads. First of all, the view is still going to look like this

Then for transaction C, the read-view is

Up-limit-id = 2 alive-trx-list = [2,3,4] low-limit-id = 5Copy the code

The two records of Xiao Ming and Xiao Fang are as visible as above and I won’t repeat the analysis here, Then this record for zhang DB_TX_ID = 2 < up – limit – id = 2 | | DB_TX_ID = = curent_tx_id = 4 was not so into the next round found DB_TX_ID > = low limit – id = 5 DB_TX_ID = 1; DB_TX_ID = 1; DB_TX_ID = 1; DB_TX_ID = 3 is not visible in the active transaction array.

So the result of transaction C select is

id name
1 Xiao Ming
2 A small party
3 Xiao zhang,

Later, transactions A and B commit, and one transaction D reads the snapshot, while the view remains the same

But now the read-view has changed

Up-limit-id = 4 alive-trx-list = [4,5] low-limit-id = 6Copy the code

DB_TX_ID = 2 < up-limit-id = 4 DB_TX_ID = 2 < up-limit-id = 4 DB_TX_ID = 2 < up-limit-id = 4 DB_TX_ID = 2 < up-limit-id = 4 DB_TX_ID = 2 < up-limit-id = 4 DB_TX_ID = 2 < up-limit-id = 4 DB_TX_ID = 3 < up-limit-id = 4 DB_TX_ID = 3 < up-limit-id = 4 So the select result should look like this

id name
1 Xiao Ming
2 A small party
4 Small bright

Finally (finally, not easy!) Transaction C once performed a SELECT operation. In RR mode, the read-view is determined at the time of the first snapshot read, so the read-view will not change, and the previous view will not change, so even if the previous transaction A transaction B has committed, it will have no effect on the select result of transaction C. So the result should be

id name
1 Xiao Ming
2 A small party
3 Xiao zhang,

conclusion

So let’s sum it up.

In fact, MVCC is version-concurrency control through “three” hidden fields (transaction ID, rollback pointer, delete flag) plus undo log and visibility algorithm.

And just to give you another insight into this algorithm, LET me put this up again