takeaway

In “Why can MySQL support fast Queries on a scale of 10 million data?” InnoDB B+Tree (InnoDB B+Tree)

SELECT * FROM user WHERE age > = 15
Copy the code

Explains how the query searches for the secondary index index_age_birth, an index tree. When I was looking for a leaf node, I said that the first record satisfying the condition can be located by starting from the first record in the node and traversing all records in the node to the right. Then, all the records satisfying the condition can be obtained by traversing all the subsequent records of the record.

However, if there are too many records in the leaf node and the first record that satisfies the condition is not in the head, the performance of sequential traversal lookup is poor. How to solve this problem? I think, as you might have guessed, yes, since the leaves are in ascending order, we can use binary search to quickly locate the first record that satisfies the criteria.

So, what does InnoDB do?

slot

For example, index_age_birth is a user table index.

B+Tree = B+Tree = B+Tree = B+Tree Yes, I did it in a rough way, but now I’ve refined it to show you how a query locates a record in the leaf of a B+Tree.

Take a closer look at the above diagram, which has three more elements than the previous leaf node:

Infimum: a virtual record on a leaf node. The next record of this record is the smallest record on a leaf node. For example, tuple record <15, 2007-06-06, 6> in the figure is the smallest record in the leaf node on page 4.

Supremum: a virtual record in the leaf node whose previous record is the largest record in the leaf node. For example, tuple record <18, 2007-06-10, 12> in the figure is the largest record in leaf node on page 4.

Slots: InnoDB divides the records in the leaf node into several groups, each group is called a slot, which refers to the largest record in the group. This slot has the following features:

  • The slot stores an offset, which is calculated from the leaf node 0 bytes. Such as:

    • In the figure 99, the 99th byte from the leaf node 0 byte is slot 0.

    • 252 in the figure indicates that the position of the 252nd byte from leaf node 0 byte is slot 1.

    • 112 in the figure indicates that the position of the 112th byte from leaf node 0 byte is slot 2.

  • A slot points to the largest record in a group. For example, in the figure, the light yellow boxes represent groups, so slot 1 points to the maximum record for the second group <17, 2007-06-10, 11>.

  • The slots form an unordered array, but the subscripts of the slots are ordered. For example, in the figure, the offsets 99, 252, and 112 are unordered, but the slot subscripts 0, 1, and 2 are ordered.

At this point, you may wonder: why does the first group have only one record, the second group has four records, and the last group has two records?

The InnoDB engine specifies that the infimum group can only have one record, the Supremum group can only have one to eight records, and the remaining groups can only have four to eight records.

Next, I will explain in detail how InnoDB locates the record in the leaf node, combined with the slot structure definition.

Lookup process

InnoDB searches for leaf nodes by binary search for slots and sequential search within the group of records referred to by slots. As for the reason why binary search is not used in the slot, the total number of records in the group that the slot points to cannot exceed 8. Therefore, it is not necessary to use binary search to improve the search performance.

I use the SQL in the introduction of this chapter as an example to explain in detail how InnoDB can find the first record in the leaf node that meets the query condition.

SELECT * FROM user WHERE age > = 15
Copy the code

See the following 5 graphs, which show the index index_age_birth of the B+Tree leaf node:

  1. The slot for locating the smallest subscript islow, as shown in Figure (1)Slot 0, the slot for locating the largest subscript ishigh, as shown in Figure (1)Channel 2.
  2. (low + high) / 2Binary search, i.e0 plus 2 over 2 is 1, so, positioningSlot 1, see Figure (2), findSlot 1Pointed recordThe < 11 > 17, 2007-06-10,.
  3. The recordage = 17, the value of age is 15,17 > 15, indicating that the target record is on the left of the record. Therefore, see Figure (2).highPoint to theSlot 1Because oflowPoint to theSlot 0.high - low = 1, so the search record is between slot 0 and slot 1.
  4. See Figure 3low_recPoint to theSlot 0Point to theinfimumRecords,high_recPoint to theSlot 1Point to theThe < 11 > 17, 2007-06-10,Record, pointing from slot 0infimumStart traversing records to the right.Ps: Figure (3), (4), (5) In order to more clearly describe the slot pointing to the group of records inside the search process, the slot is removed.
  5. Iterate to the second record6 > < 15, 2007-06-06,, see Figure (4), the recordage = 15, the value of age is 15,15 = 15, therefore, this record is the first record that satisfies the query conditions.low_recPoints to the record.
  6. Iterate to the third record8 > < 16, 2007-06-08,, see Figure (5), the recordage = 16, the value of age is 15,16 > 15, so,high_recPoints to the record.
  7. high_recandlow_recIf the interval is 1, the search is complete. Finally, the location meets the search criteriaage>=15The first record of is6 > < 15, 2007-06-06,.

In the process of searching, you may wonder, why do I need to go to step 6 when I have already found the first record that satisfies the criteria in step 5?

Since this search process is a general algorithm, suppose we want to search for records with age = 16, then page 4 contains two records that meet the condition. Combined with the search algorithm above, is it necessary to execute step 6 and step 7 to find all the matching records? So InnoDB uses this generic search algorithm to cover age >= 15 and age = 16.

summary

Finally, let me summarize the content of this chapter:

  • Slots: Leaves are grouped, slots point to the largest record in each group, and the slots store the offset that points to the record in the leaves, which is calculated by 0 bytes relative to the leaves.

  • InnoDB search index tree leaf node process: the slot binary search, slot points to the record in the group order search.

To consider

Does InnoDB do binary search on leaves directly? Why to introduce the concept of slot search index tree leaf node, that is, to the slot binary search, slot points to the record in the group order search?

I am Tinker bell, and you grow together!