I don’t know how InnoDB locks anymore!

A popular but wrong idea

It is not known when the following erroneous view became widely held:

When reading tables using InnoDB’s storage engine using lock reads, row locks are converted to table locks when indexes are not used during queries.

It is important to note that for any INSERT, DELETE, UPDATE, SELECT… LOCK IN SHARE MODE, SELECT… FOR UPDATE, InnoDB storage engine does not add table level S or X locks (we will not discuss table level intention locks here), only row level locks. So even for a full table scan lock read statement, only the records in the table will be locked, not directly a table lock.

If someone asks you what kind of lock will be added to a certain statement, you can say: “What kind of lock will be added to this statement?”

We will give you a detailed analysis of the factors affecting locking, as well as from the source point of view of InnoDB in the end how to lock, I hope friends after reading will exound: really TM simple!

But we need to state before the discussion, we just discuss InnoDB plus transaction locks, namely, to avoid dirty, dirty read, written not repeatable read, phantom consistency of the problems of these phenomena and to add the lock, is not to in a multithreaded access to a Shared memory region from time to time to add the lock (for example, two different transaction’s thread to read and write the same page, Lock protection is required) and does not include MDL locks added to the Server layer.

The source version referenced for this article is 5.7.22.

What exactly is a transaction lock

Lock is a memory structure defined by InnoDB’s lock_t structure:

Both row and table locks are represented by this structure. Let me draw you a picture:

Type_mode is used to determine whether a table lock is a row lock or a table lock, whether a table lock is an intentional lock, a table lock, or an auto-inc lock, and whether a row lock is a proper record lock, gap lock, or next-key lock.

Tip: In InnoDB’s implementation, InnoDB row locks are one-to-one with records. Even for gap locks, the implementation is to generate a lock structure for a record, and then the lock structure type is gap lock, not a lock structure for a certain interval. The function of the GAP lock is that whenever other transactions insert records, it will check whether the next record to insert records has a gap lock structure, if so, it will enter the blocking state.

In addition to the generated lock structure, there is also a locking method called implicit lock, which does not generate lock structure). Of course, if locking a record generates a lock structure, that would be a waste of time! InnoDB design uncle proposed an optimization scheme, that is, the same transaction, the same type of lock on the same page are placed in the same lock structure.

By various types of lock is if type_mode have all sorts of lock, what role, and how to reduce generation lock structure details we here is not opened, so it takes a long length, you can go to the MySQL is how to run: from insufficient understanding on MySQL “view in the books, we are here to see the specific lock details.

The preparatory work

For the story to run smoothly, let’s create a table hero:

CREATE TABLE hero (
    number INT,
    name VARCHAR(100),
    country varchar(100),
    PRIMARY KEY (number),
    KEY idx_name (name)
) Engine=InnoDB CHARSET=utf8;
Copy the code

Then insert a few entries into the table:

INSERT INTO hero VALUES (1, 'l, liu bei, "shu"), (3,' z zhuge liang, "shu"), (8, 'cao cao c', 'w'), (15, xun yu 'x', 'w') and (20 's sun quan', 'wu');Copy the code

Now the hero table has two indexes (a secondary index and a clustered index), as shown below:

Add lock affected by what factors

The lock that a statement places on the XXX record depends on a number of factors. If you are not sure about these factors, you should not be the first to say that the XXX statement places on the XXX record:

  • The isolation level of the transaction
  • Type of index used for statement execution (such as clustered index, unique secondary index, plain secondary index)
  • Is it an exact match
  • Is it a unique search
  • Specific types of statements executed (SELECT, INSERT, DELETE, UPDATE)
  • Whether to enable the innodb_locks_unsafe_for_binlog system variable
  • Whether the record is marked for deletion

There are a couple of concepts here that you may not understand very well, but let’s explain them.

Scanning interval

For example, the following query:

SELECT * FROM hero WHERE name <= 'l 'AND country = 'l ';Copy the code

MySQL can perform the above query in two ways:

  • Select idx_name, idx_name, idx_name, idx_name, idx_name, idx_name, idx_name, idx_name, idx_name, idx_name

  • Scan all cluster index records directly, that is, perform full table scan. At this point, it is equivalent to scanning all cluster index records in the interval of number value (-∞, +∞).

The optimizer calculates which of the two methods is cheaper and chooses the cheaper one to execute the query.

When the optimizer uses a secondary index to perform a query, we call (-∞, ‘l liu ‘] a scan interval, meaning that we need to scan all secondary index records in this interval with the value of name column. We can also call the condition name <= ‘L liu’ as the boundary condition of the scan interval. When the optimizer uses a full table scan to perform a query, we refer to (-∞, +∞) as the scan interval, meaning that we need to scan all clustered index records within this interval with the number value.

During the execution of a query, multiple scan intervals may be used, as shown below:

SELECT * FROM hero WHERE name < 'l 'OR name > 'x xun yu ';Copy the code

If the optimizer uses the secondary index IDx_name to perform the above query, then the corresponding scan interval is (-∞, L Liu Bei) and (‘ X Xun YU ‘, +∞), that is, the record of name value in the above two scan interval needs to be scanned.

Whenever InnoDB needs to scan a record in a scan interval, it takes two steps:

  • Through the B+ tree corresponding to the index, start from the root page and move all the way down until the first record in the scanning range is located in the leaf node.

  • After can don’t need to continue from the root node localization, but by recording next_record attributes directly find scanning interval with a record. (page connection through two-way linked list, after looking for the record in a page, you can follow the double linked list to the next page to find belong to the same scan interval record).

That is to say, when scanning records in a scan range, only locating the first record is a little more troublesome, and other records only need to be scanned along the linked list (records in a single page are connected into a one-way linked list, and there is a two-way linked list between different pages).

An exact match

For the boundary condition that forms the scan interval, if it is equivalent matching condition, we call the matching mode of this scan interval exact matching. For example:

SELECT * FROM hero WHERE name = 'c' AND country = 'c ';Copy the code

If idx_name is used to perform the above query, the scan interval is [‘l liu bei ‘, ‘L liu Bei ‘], and the boundary condition is name =’ L liu Bei ‘. We call the matching pattern when executing the above query using the secondary index IDx_name an exact match.

For this query down here

SELECT * FROM hero WHERE name <= 'l 'AND country = 'l ';Copy the code

Obviously it’s not an exact match.

Uniqueness search

If you can determine in advance that a scan interval contains at most one record before scanning it, this is called a unique search. Let’s look at the code that determines whether the records that scan a certain scan range are unique searches:

Among them:

  1. The matching mode is exact matching
  2. The index used is a clustered index or unique secondary index
  3. If the index contains more than one column, each column should be used when generating scan intervals
  4. If the index being used is a unique secondary index, records with an index listed as NULL cannot be searched during a search (because multiple records with a NULL value can be stored for a unique secondary index).

All of the above points are easier to understand. Let’s explain point 3 a little bit. For example, we set up a unique secondary index UK_a_b (a, b) for columns A and B of a table, then the scanning interval formed by the search condition A =1 cannot be guaranteed to contain only one record at most. Only the scan interval formed by search conditions A =1 AND b= 1 can ensure that the scan interval contains only one record (excluding records whose delete_flag=1).

row_search_mvcc

We know that MySQL is actually divided into two parts: server layer and storage engine layer. Whenever a query is executed, the Server layer is responsible for generating the execution plan, that is, selecting the index to be used and the corresponding scan interval. For each scan interval, InnoDB will:

  • The server layer asks InnoDB to scan the first record of the interval

  • InnoDB uses the B+ tree to locate the first record in the scan interval (if it is a secondary index record and there is a need for a table to retrieve the full cluster index record) and returns it to the Server layer

  • The Server layer checks whether the records meet the search criteria and sends the records to the client. If the records do not meet the search criteria, the server layer skips the records. Continue asking InnoDB for the next record.

Tip: The record sent to the client is actually sent to the local network buffer, which is controlled by net_buffer_length and is 16KB by default. Wait until the buffer is full to actually send the network packet to the client.

  • InnoDB finds the next record based on the unidirectional list of records and the bidirectional list between pages (if it locates a secondary index record and needs a back table, the back table gets the full clustered index record) and returns it to the Server layer.

  • The Server layer processes the record and asks InnoDB for the next record

  • . This process continues until InnoDB reads a record that does not meet the boundary conditions

In general, the server layer and the storage engine layer communicate on a record basis, and InnoDB’s most important function for reading a record is row_search_mvcc:

You can see that this function is frighteningly long, with over a thousand lines.

Tip: Do you have any colleagues who write more than 1,000 lines of business logic in one function? If so, would you like to hit them?

In ROW_search_MVCC, the process of determining the visibility of multiple versions of a record, deciding whether to lock the record, and if so, what to lock, and converting the record from InnoDB storage format to server layer storage format is very complicated.

In fact, UPDATE and DELETE statements need to locate the corresponding record in the B+ tree before executing them, so they also call ROW_search_Mvcc.

InnoDB locks records mainly in row_search_mvCC, such as SELECT… LOCK IN SHARE MODE, SELECT… Statements like FOR UPDATE, UPDATE, and DELETE all call Row_search_mvcc to lock. SELECT … LOCK IN SHARE MODE adds type S locks to records, SELECT… FOR UPDATE, UPDATE, and DELETE add an X-lock to the record.

InnoDB calls ROW_search_MvCC every time it reads a record, and after a long enough preparation, we can finally look at how a record is locked in the row_search_mvCC function.

How exactly do statements lock

Let’s start with a very important variable:

Set_also_gap_locks specifies whether to add a gap lock to a record (a next-key lock can be considered a combination of a proper record lock and a GAP lock). Its default value is TRUE, indicating that a gap lock is added to a record by default.

Set_also_gap_locks may change at this point:

That is, if the current execution is SELECT… LOCK IN SHARE MODE or SELECT… Set set_ALSO_Gap_LOCKS to FALSE when the isolation level is not greater than READ COMMITTED FOR a lock READ statement such as FOR UPDATE (non-DELETE or UPDATE).

Prebuilt ->select_lock_type specifies the lock type, LOCK_NONE specifies no lock, and LOCK_S specifies S lock. LOCK IN SHARE MODE), LOCK_X means to LOCK X (for example, SELECT… FOR UPDATE, DELETE, UPDATE)

Normal SELECT processing and the addition of intent locks

Look further back:

Among them:

  • The arrow marked 1 is a normal SELECT processing that requires a ReadView to be generated before the query is opened.

For Repeatable Read isolation level, the Readview is generated only when the first SELECT statement is executed. The Readview is reused for all subsequent SELECT statements. For the Read Committed isolation level, a ReadView is generated each time a SELECT statement is executed. This is not done in the code shown in the screenshot above.

  • The arrow labeled 2 IS the processing of a lock read statement. A table level intent lock (IS or IX) IS required before the record IS read for the first time (prebuilt-> SQL_stat_start indicates whether it IS read for the first time).

For the ORDER BY… The processing of DESC

Here we begin to locate the first record in a scan interval by B+ tree:

Where btr_pcur_open_WITH_NO_init is the function used to locate the first record in the scan interval.

At each level of the B+ tree node, the records are sorted in ascending order of key value. For a scan interval, InnoDB typically locates the record with the lowest key value in the scan interval and scans backwards from left to right.

But for the following query:

SELECT * FROM hero WHERE name <'s 'AND country ='s' ORDER BY name DESC FOR UPDATE;Copy the code

If the optimizer decides to use the secondary index IDx_name to perform the above query, the corresponding scan interval is (-∞,’s sun power ‘). InnoDB needs to locate the rightmost record in the scan interval since the records are returned to the user in the order of largest to smallest.

Obviously, the secondary index record whose name is’ L Liu Bei ‘is the rightmost record in the scan interval (-∞,’s Sun Quan ‘).

The following code handles the case of scanning records in the scan interval from right to left:

Sel_set_rec_lock is the function that locks a record.

If the isolation level is no less than REPEATABLE READ and innodb_lockS_unSAFE_for_binlog is not enabled, A lock of type LOCK_ORDINARY is added to the next record of the rightmost record in the scan interval. A lock of type LOCK_ORDINARY is actually a next-key lock.

In this case, the isolation level of the transaction is assumed to be REPATABLEREAD. The next key lock should be added to the next index whose name is’ sun quan ‘.

Tip: You can read the comments for the code above, but the main purpose of locking is to prevent phantom reads.

That’s when the real locking process begins — the processing of Infimum and Supremum records

As you can see from the above code, if the current record read is an Infimum record, do nothing and go straight to the next record.

If the current reading record is a Supremum record, a next-key lock of type LOCK_ORDINARY will be added to the record if the following conditions are true:

  • Set_also_gap_locks is TRUE (this variable is set only on the front, and is set to FALSE for SELECT statements that are not greater than READ COMMITTED, and TRUE otherwise).

  • Innodb_locks_unsafe_for_binlog system variable is disabled and transaction isolation level is not less than REPEATABLE READ.

  • This read is a lock read

  • It is not a spatial index that is used.

Since the Supremum record is itself a pseudo-record and is not updated or deleted by other transactions, adding a next-key lock to it has the same effect as adding a gap lock to it.

Tip: Infimum records and Supremum records are two virtual records that InnoDB automatically adds to every page in the B+ tree. They can also be called pseudo records. Infimum records and Supremum records take up 13 bytes of storage space each and are placed in fixed locations on the page. The Infimum record is regarded as the smallest record, the Supremum record is regarded as the largest record, the Infimum record belongs to the head node of the uni-linked list of records in the page, and the Supremum record belongs to the tail node of the uni-linked list of records in the page. For more information about the page structure, you can refer to the book “How MySQL works: Understanding MySQL from the root”

That’s when the real locking process begins — the special treatment of exact matching

Sorry, this is a prelude to the actual locking of row_search_MVCC.

Let’s first look at a special treatment for exact matching.

As you can see, for the scan interval where the matching pattern is an exact match, if the records obtained by executing row_search_MVCC are not in the scan interval (0! = cmp_dtuple_rec(search_tuple, rec, offsets)), then some special processing is required, that is:

For lock reads, if the transaction isolation level is no less than Repeatable Read and innodb_lockS_unSAFE_FOR_binlog system variable is not enabled, then a GAP lock is applied to the record. And return directly (in the code directly jump to normal_return), no subsequent locking operations.

For example, if the current transaction isolation level is Repeatable Read, execute the following statement:

SELECT * FROM hero WHERE name = 'FOR UPDATE';Copy the code

If the secondary index IDx_name is used to perform the above query, the corresponding scan interval is [‘s sun quan ‘,’s sun quan ‘]. This statement will first lock the record whose name value is’ sun quan ‘, but the record is in the scan interval. The above code does not handle this normal case, which will be discussed later.

After reading ‘sun quan ‘, InnoDB will find the next secondary index record according to the record’s next_record property, that is, the name of the second index record is ‘x Xun for ‘, this record is not in the scan interval [‘ sun quan ‘,’ sun quan ‘], so it matches 0! = cmp_dtuple_rec(search_tuple, rec, offsets) condition, then execute the above code locking process — add a GAP lock to the secondary index record whose name value is ‘x XunFOR ‘, and end the scan interval query.

The real locking process begins — this time

We draw two red boxes in the code, these two red boxes are for the record is not the record of the gap lock scene. So let’s see.

For red box # 1:

  • Set_also_gap_locks is FALSE (this variable is set only on the front, when the isolation level is not greater than READ COMMITTED SELECT statements lock reads are set to FALSE, otherwise TRUE).

  • Enable the innodb_locks_unsafe_for_binlog system variable

  • The isolation level of a transaction is not greater than READ COMMITTED

  • Search for uniqueness and the record’s delete_flag is not 1

  • The index is a spatial index

In other words, if any of the above conditions are true, the record should not be locked by gap, but should be locked by proper record. In other cases, a next-key lock (a combination of a gap lock and a proper record lock) should be added.

Then no. 2 red box describes a scene without gap lock:

for> = the primary keyFor this boundary condition, if the current record happens to be the start boundary, we only need to add a proper record lock to the record without adding a gap lock.

Red box number one is easier to understand, so let’s take an example and see what red box number two is saying. For example, the following query:

SELCT * FROM hero WHERE number >= 8 FOR UPDATE;
Copy the code

We assume that this statement is REPEATABLE READ at the isolation level.

Obviously, the optimizer scans the cluster index record of [8, +∞). First, the B+ tree is used to locate the first record of the scan interval [8, +∞), i.e. the cluster index record of number 8, which is the beginning boundary record of the scan interval [8, +∞). The next-key lock should be added in the REPEATABLE READ isolation level, but due to the code in the red box # 2, only the proper record lock is added to the clustered index record with number value 8.

Tip: The optimization of box 2 is based on the “primary key is unique” constraint. After one transaction executes the above query, no other transaction can insert a record with the value of number 8. This does not require a gap lock.

Add a next-key lock to all the records except for box 1 and box 2

The back table locks the record

If row_search_MvCC reads secondary index records, then it needs to return to the table and add a proper lock to the cluster index records:

Row_sel_get_clust_rec_for_mysql is the function used to return to the table, and the logic to lock the cluster index is implemented in this function, which we will not expand here.

It is important to note that even in the case of overwriting indexes, if we want to x-lock records (i.e., use SELECT… FOR UPDATE, DELETE, UPDATE), also need to perform table back operations on secondary index records and add proper record locks to the corresponding clustered index records.

There are also lock release scenarios

Suddenly found that had written a lot of a lot of, release the lock of the scene is not nagging.

To summarize

If you look back at the row_search_mvcc lock code, you can see that the process is quite simple:

  • Normal SELECT statements are unlocked

  • You need to add an intent lock to the table before you can lock the record

  • If the scan interval is scanned from right to left, a GAP lock needs to be added to the next record of the rightmost record in the scan interval (if isolation level is no less than REPEATABLE READ and innodb_LOCKS_unSAFE_FOR_binlog system variable is not enabled).

  • Infimum records are unlocked, Supremum records are next-key locked (if isolation level is no less than REPEATABLE READ and innodb_LOCKS_unSAFE_for_binlog system variable is not enabled).

  • For precisely matched scan interval, when all the records in the scan interval are read, Add a GAP lock to the first record after the scan interval to end the scan interval query (in case the isolation level is not less than REPEATABLE READ and innodb_LOCKS_unSAFE_FOR_binlog system variable is not enabled).

  • Innodb_locks_unsafe_for_binlog system variable, unique search, delete_flag is not 1, for >= primary key, the current record is the start of the record. Add a proper record lock to the record, otherwise add a next-key lock.

  • If the secondary index record is locked, the corresponding cluster index record also needs to be properly locked.

Make an advertisement for a children’s book

  • How MySQL works: Understanding MySQL from the root: juejin.cn/book/684473…

  • How to use MySQL: learning MySQL from scratch: juejin.cn/book/684473…

  • How Computers Work: Understanding Computers from the Root: juejin.cn/book/684473…

  • How MySQL Works: Finding Ways to Understand MySQL

Jingdong link: u.jd.com/1Mldwv9

Dangdang link: u.dangdang.com/qRPRW

Thank you very much big guy support, just don’t let the child starve to death street.

We are all little frogs, ribbit ribbit ribbit