preface

I am unreal read, heard that some people think I am MVCC solve, in order to let everyone understand me more comprehensively, can only personally explain.

series

1. Uncover the mystery of MySQL index

2. MySQL query optimization prerequisites

3. Come up and ask MySQL transaction, shivering...

4. MVCC: I heard people were curious about my low-level implementation

Who am I?

Just to give you a quick introduction, I’m one of the top three problems with concurrent transactions.

My other two brothers, dirty and unrepeatable, were ruthlessly killed by MVCC in the last turn, and what happened in the last turn can be found in the recaps.

I got there because the owner was manipulating a set of data and a lot of other people were manipulating that set of data.

Take a simple example:

The result of filtering a group of data according to the conditions is 100, but when the master operates, other people add the data that meet the conditions, and then the master conducts the query again, the result is 101. The data returned the second time is inconsistent with the data returned the first time.

So I was born, and they gave me a very nice name.

Why did you name me that?! That’s because I give people an illusion.

Why would anyone think I was killed by MVCC

For the sake of demonstration, we will directly use the previous test table to operate.

And you can also see that this table has some test data, so let’s start at the beginning and clear the table.

Truncate table_name To clear a table

Executing this command clears the table and increments the ID from 1.

In terms of execution process, TRUNCate TABLE is similar to drop table and then create table. The environment here is test environment. Do not operate online, because it bypassed DML method and cannot be rolled back.

After a little interlude, we got to the point.

Based on the above steps, it is expected that the query result of the first SELECT statement of the transaction on the left is null.

The second SELECT query results in 1 data, containing the data committed by the transaction on the right.

However, in the actual test case, the first and second select returns the same results.

From this case, it can be concluded that the phantom read problem is indeed solved at the non-repeatable isolation level (in the case of snapshot reads).

Am I really solved by MVCC?

From the above test cases, it seems that MVCC can solve my problems in MySQL, so why there is a serialized isolation level? I’m so confused!

Continue the experiment with this question, for the sake of convenience do not use the above table structure, build a simple table structure.

One more episode do you know how to clear the screen on the MySQL terminal?

Run the system clear command

Then a new round of tests began

In case transaction 1 above, the query data is null for several times.

At this point transaction 2 has successfully inserted and committed the data.

However, when transaction 1 inserts data after several empty queries, primary key duplication is prompted.

Let’s do another example

  • Step1: Transaction 1 starts the transaction
  • Step2: Transaction 2 Start the transaction
  • Step3: Transaction 1 query data only one data
  • Step4: Transaction 2 adds a piece of data
  • Step5: Transaction 1 queries data as one
  • Step6: Transaction 2 commits the transaction
  • Step7: Transaction 1 queries data as one
  • Step8: Transaction 1 change the name
  • Step9: Guess what changes will happen to the data in the table at this time

In this case, transaction 1 always reads a single piece of data, but when modifying the data, it affects 2 rows of data, and when it looks at the data again, the data added by transaction 2 actually appears. This can also be regarded as a kind of illusory reading.

summary

It can be seen from the above two cases that the illusionary problem is not completely solved in MySQL repeatable read isolation level, but only solved the illusionary problem under snapshot read.

However, the illusion problem still exists for the current read operation, that is to say, MVCC’s solution to the illusion is not complete.

4, talk about the current read, snapshot read

The snapshot read in the previous round, the current read has been digested, to prevent indigestion here briefly.

The current reading

All operations are locked, and locks are mutually exclusive except for shared locks. If you want to add, delete, change, and search, you need to wait for the release of the lock. Therefore, the data read is the latest record.

Simply put, the current read is the lock, add, delete, change, search, whether the lock is shared lock, exclusive lock are the current read.

In MySQL’s Innodb storage engine, add, delete, and change operations are locked by default, so add, delete, and change operations are read by default.

Read the snapshot

Snapshot reads were introduced to improve transaction concurrency based on my enemy MVCC

Simply put, snapshot read is a non-blocking read without locking, namely a simple select operation (select * from user).

When Innodb storage engine performs a simple select operation, the current snapshot read data is recorded, subsequent select data is used by the first snapshot read data, even if there are other transactions committed does not affect the current select results, this solves the non-repeatable read problem.

Snapshot read data is consistent, but may not be the latest data but historical data.

Five, tell you! I’m getting killed by next-key locks in the current read case

The second section reads that the problem I caused under the snapshot read has been eliminated by the MVCC.

However, the case test in section 3 found that I was fully revived after the current reading.

If I get killed so easily how can I be called the invincible roach, this is not a joke!

If MVCC were to lock its next-key locks on then I’d be dead, no more grey Wolf saying, “I’ll be back.”

At this point you need to consider whether next-key locks are added to snapshot reads by default in the Innodb storage engine, or whether manual locks are required.

The official documentation explains next-key locks.

To prevent phantoms, InnoDB uses an algorithm called next-key locking that combines index-row locking with gap locking. InnoDB performs row-level locking in such a way that when it searches or scans a table index, it sets shared or exclusive locks on the index records it encounters. Thus, the row-level locks are actually index-record locks. In addition, a next-key lock on an index record also affects the “gap” before that index record. That is, a next-key lock is an index-record lock plus a gap lock on the gap preceding the index record. If one session has a shared or exclusive lock on record R in an index, another session cannot insert a new index record in the gap immediately before R in the index order.

To prevent phantoms, Innodb uses a next-key lock algorithm that combines record and gap locks. Innodb row locks are shared or exclusive locks on index records encountered when searching or scanning table indexes. Therefore, row locks are actually index records locks. In addition, a lock placed on an index record also affects the “gap” before the index record. That is, a next-key lock is a gap lock on the index record row plus the “gap” before the index record.

SELECT * FROM child WHERE id > 100 FOR UPDATE;

When Innodb scans an index, it locks the index with an ID greater than 100, preventing any data greater than 100 from being added.

This is the answer to the above question, in Innodb to solve the illusion problem generated by the current read requires manual locking.

Let’s do another example

The following figure shows the data situation at this time

The example below solves the illusory problem of the first case in Section 3.

  • Step Transaction 1: Start the transaction
  • Step transaction 2: Start the transaction
  • Step transaction 1: query the data with ID 4 and add an exclusive lock
  • Step transaction 2: add data with ID 4 and wait for transaction 1 to release the lock
  • Step transaction 1: Data 4 is added successfully
  • Step transaction 1: Query current data
  • Step transaction 1: Commit the transaction
  • Step transaction 2: error, return primary key duplication problem.

In this case, the index column is the primary key and is unique. In this case, the Innodb engine degrades the next-key lock, that is, only the index row of the current query is locked, not the range lock.

Case 2

Again, use the above data, but this time let’s do a range lookup.

At this point, the data is 1,3,5, and the search scope is greater than 3.

As can be seen from the following figure, when transaction 2 is executed to add ID 2, it can be added successfully.

But when ID 6 is added, wait.

If transaction 1 does not commit the transaction, transaction 2 fails to add the data with ID 6.

Select * from user where id > 3 for update; The execution returns only 5 rows of data.

The lock range is (3,5), (5, ∞), so it can be inserted if id is 2, and cannot be inserted if id is 4 or greater than 5.

This is the final solution to solve magic reading problem in Innodb.

Six, magic reading solution

In order to facilitate our intuitive understanding of magic read solutions, here is a simple summary.

Using MVCC to solve the illusion problem under snapshot read, why can be solved? The first simple SELECT statement generates a snapshot, and all subsequent SELECT queries use the results of the first snapshot read. Therefore, the data queried by snapshot reading may be historical data.

Next key lock is a combination of record lock and gap lock, locking is a range, if the query data is indexed record row, only the current row will be locked, that is, degraded to record lock. If a range is searched for, a range will be locked. For example, in the above example, if the ID is 1,3,5, and the value is greater than 3, the range will be locked for (3,5], (5, ∞). Other transactions cannot be inserted until the lock is released.

In order to verify the uniqueness of the data, you only need to add a shared lock to the query, that is, add in lock share mode to the SELECT statement. If the result is null, you can insert it, and the value inserted must be unique. It is also possible to add a next key lock to prevent others from inserting the same data at the same time. All the cases in Section 5 use a next-key lock, as you can see from this next-key lock is used to lock indexes that do not exist in a table.

Based on the above results, if you want to use a shared lock to detect data uniqueness, what if multiple transactions open the shared lock at the same time and add the same data at the same time? If multiple transactions insert the same data at the same time, only one transaction will succeed, and the other transactions will throw an error. This is a new concept called “deadlock”.

Seven, extension,

When is the transaction ID assigned?

One of the things you’ve seen in this article, and elsewhere, is that when you execute a simple SELECT statement it also generates a read-view.

Although snapshot reads and read-views are based on transaction initiation, read-veiw is formed by uncommitted transaction ids.

So when exactly is the transaction ID assigned?

There are two ways to start a transaction: display start, or set autoCOMMIT =0 and execute select to start the transaction.

The easiest way to start a display startup is with the BEGIN statement. You can also start a transaction using start Transaction.

If you use start TrancAction to start a transaction, you can also choose to start a read-only or read-write transaction.

A transaction ID is assigned when a transaction is started.

As you can see from the above figure, the query transaction ID is empty when a BEGIN statement is executed, meaning that trx_id is not assigned when BEGIN is executed.

What about support for DML statements after begin?

According to the documents

Executing the begin command does not actually start a transaction, but merely sets a flag for the current thread to indicate that it is an explicitly opened transaction.

Therefore, it should be understood that a transaction is really started only after data is added, deleted, changed, and searched. At this time, transactions will be started in the engine layer.

Why are transaction ids particularly different?

In the figure above, the transaction ids that are currently active are queried, but the difference between the two transaction ids is very large.

I believe that many friends have encountered this problem, there is a problem is not afraid, afraid is no problem.

In fact, only 20841 of the two data sets is the actual transaction ID, so what is the ID in the second data set?

If you want to know what that number is, you have to know where it came from.

The select statement generates a very large transaction ID. The select statement generates a very large transaction ID. The select statement generates a very large transaction ID.

Then an INSERT statement is executed under the transaction and the status of the transaction ID is checked

The transaction ID changes when the select statement is executed and then the INSERT statement is executed.

Data query shows that when a simple SELECT statement is executed, it is called a read-only transaction. To avoid the unnecessary overhead of assigning trx_id to a read-only transaction, no transaction ID is assigned to it. The value of trx_id for a read-only transaction is 0. . But in order to perform a select * from information_schema INNODB_TRX or show engine innodb status when will pass reinterpret_cast (TRX) | (max_trx_id + 1) Convert the pointer to a 64-byte non-negative integer and then bit or (max_trx_id + 1) is such a value.

There is no need to go into the process of generating this value, except that under read-only transactions no transaction ID is assigned, and the queried value exists only for display purposes and has no practical significance.

However, if you run the show engine Innodb status query to query the transaction ID from information_schema.INNODB_TRX, you will not find the transaction ID. Under Innodb if a transaction is a read-only transaction it will not show up in the Innodb data structure, so you will not see it.

Insist on learning, insist on writing, insist on sharing is the belief that Kaka has been holding since he started his career. I hope the articles on the Internet can bring you a little help. I’m Kaka. See you next time.