takeaway

As a dating platform, we still use its core function, that is, search users to open today’s sharing.

Let’s say we want to search for girls between the ages of 18 and 24, and ask them to rank by age. If there are tens of millions of registered users on the platform, we will generally page the search results to avoid slow loading of the results. Therefore, to achieve this function, based on the user table, we will write a SQL like this:

SELECT * FROM user WHERE age > = 18 AND age < = 24 AND sex = 0 ORDER BY age LIMIT 0.50
Copy the code

If I want to ensure the efficiency of this SQL query when the user scale reaches tens of millions, we will definitely add an overwrite index index_age_sex(age,sex) to ensure the performance of large-scale users.

So why is query performance good with overwrite indexes? So, today I’m going to explain what an overwrite index is and how MySQL uses an overwrite index to find records. Once we understand this process, we’ll see why it’s faster to find records using an overwrite index.

Cover index

Overwriting an index means adding a sort field to the index, ensuring that the nodes of the index tree contain the sort field.

You can combine this with “Why does MySQL support fast queries on a scale of 10 million data?” And “How many values are most appropriate to query IN field?” Index_age_sex (age,sex) (age,sex) (age,sex)

index_age_sexThe leaf node of the index tree contains the sort fieldageTo ensure theageIt’s sorted itself.

Therefore, MySQL only needs 2 steps to find an ordered result that satisfies the condition:

  1. traverseindex_age_sexLeaf node in the index tree, find the record primary key ID that meets the condition
  2. Use the primary key ID above to find the corresponding record in the leaf node of the cluster index

Sorting fields are ordered in the leaf nodes of the index tree, which reduces the unnecessary sorting process, so it greatly improves the efficiency of SQL query.

If at this time, THROUGH the above user search, I find a girl I like, and then follow her, we have a good chat through the platform chat function. But, after a period of time work favour, have no in time to communicate with each other more, after a busy, you want to get her to chat, because you is just a part of fuzzy remember her account, at the same time, remember her nickname first half letter is small, so you are trying to in their attention through the list of keyword search nickname to quickly find her, At this point, the query SQL becomes:

SELECT * FROM user WHERE user_name LIKE "%am%" AND age > = 18 AND age < = 24 AND sex = 0 ORDER BY age, user_name LIMIT 0.50
Copy the code

Add index index_un_age_sex(user_name,age,sex); add index index_un_age_sex(user_name,age,sex); add index index_un_age_sex(age,sex); However, according to the search process overwritten by the index above, we found that in the above SQL, because user_name in the query condition is a fuzzy match between the two ends of the string, so index_UN_age_sex cannot be used to find the user, that is, index_UN_age_sex cannot be matched. In the case of a large number of users, query performance will be affected. Now, what can we do to solve this problem?

Using index condition; Using index condition; Using where; Using filesort, as follows:

Using index condition; Using where; What does Using filesort mean? In fact, this keyword describes the process:

  1. MySQL uses indexes firstindex_age_sexSome user records in the preceding keyword are filtered outUsing index
  2. Conditions of useuser_nameField fuzzy match, filter out some user records from the results in Step 1, corresponding to the above keywordsUsing where
  3. Sort the user records obtained in the previous step, corresponding to the above keywordUsing filesort

Now you know what Filesort is.

Filesort

Filesort indicates that MySQL uses temporary files to sort the intermediate results of where conditional filtering. Let’s take a look at how SQL is sorted using Filesort:

  1. Select * from index where age >= 18 AND age <= 24 AND sex = 0; select * from index where age >= 18 AND age <= 24 AND sex = 0; Read “Why MySQL supports fast queries on a scale of 10 million data?” in order. And “Does InnoDB find b-Tree leaves sequentially?” Two articles.

  2. Select * from user_name where user_name LIKE “%am% “;

  3. MySQL allocates a block of memory to each thread for sorting. This area of memory is called sort_buffer. I’m not going to go into the details of quicksort, which many of you learned in college, but there are two cases:

    (1) When the value of the SELECT field + sort field is less than or equal to max_length_for_sort_data, the sort_buffer is filled with all the fields, and then the sort fields are sorted.

    Such as: In overwriting indexes, the SELECT clause is *. If the value of the SELECT field + sort field is less than or equal to max_LENGTH_for_sort_data, that is, the total size of all table fields is less than or equal to max_length_for_sort_data, MySQL writes all the fields in the user table that meet the query criteria into the sort_buffer, and then sorts the fields age and username in order to get the complete result after sorting.

    (2) When the value of the SELECT field + sort field is larger than max_length_for_sort_data, the sort field + primary key ID is written into the sort_buffer. Then, the sort field is sorted. Finally, the corresponding records are extracted from the cluster index according to the primary key ID.

    Such as: If the value of the SELECT field + sort field is greater than max_LENGTH_for_sort_data, that is, the total size of all table fields is greater than max_LENGTH_for_sort_data. MySQL writes the age, USERNAME, and ID that meet the query conditions in the user table into the sort_buffer. Then, the MySQL database sorts the age and USERNAME fields in sequence. After sorting, the MySQL database obtains the corresponding records from the cluster index according to the primary key ID.

By comparing the above two sorting processes, we find that the following sorting scheme will return to the table (cluster index search) one more time. If the cluster index is on disk, disk I/O will be generated and performance will be affected.

Therefore, we can avoid back table queries by using the following two methods:

  1. Try not to use the fields in the SELECT section of the SQL*Instead, specify the fields, ensuring that the value size of the field + sort field in SELECT is less than or equal to the parametermax_length_for_sort_dataSo MySQL will use (1) above to sort
  2. If you must use it*, make sure that the total length of the fields in the table does not exceedmax_length_for_sort_dataMySQL will also use the (1) sorting scheme above

If the SELECT field + sort field is less than or equal to the max_length_for_sort_data parameter, the sorting performance is best.

If username, the sort field in the Overwrite Index, is designed to be 32 bytes in length and 200 bytes in length, and the SELECT field + sort field value is less than or equal to max_length_for_sort_data, then How do the two field length designs affect sorting performance differently?

At this point, I have to mention the process of sorting comparison, through this process, I will uncover the answer to the above question!

Sorting is

MySQL to sort the field to use memcMP C function library, because memcmap makes full use of 64-bit CPU features, so, very high performance! Why is that?

Here I will take 64-bit CPU as an example, take a look at the memCMP comparison process, MEMCMP function into assembly instructions, mainly contains the following processes:

  1. Through the MOV instruction, the two input parameter addresses for comparison are read from memory and written to the two RAX registers respectively
  2. The address in the raX register above is memory-aligned by the NOP instruction
  3. By CMP instruction, the unaligned part is compared by single byte
  4. By CMP instruction, for the aligned part, the unit is 32 bytes, and by redundancy instruction, do 4 times of 8-byte (QWORD) comparison. Why do 4 times of comparison between the two input parameter addresses in 32 bytes? I’ll talk about that next.
  5. Through the CMP instruction, the 8-byte (qword) comparison is made for the part of less than 32 bytes in step 4 in the unit of 8-byte
  6. Through CMP instruction, single-byte comparison is made for the last remaining part of less than 8 bytes in Step 5

Let me use the MOV instruction from the first step above as an example to see how the instruction is processed in the CPU.

Below is the core architecture of Intel Nehalem processor. Let’s look at the processing process of MOV instruction from top to bottom:

  1. The instruction extraction unit (IFU) reads the position of the instruction MOV in the CPU L1 Cache from the ITLB

  2. The instruction Extraction Unit (IFU) reads if… Since the MOV instruction operation does not involve an if condition, the result of this read is null

  3. Instruction extraction unit (IFU) finds Instruction MOV in Instruction Cache in L1 Cache of CPU according to the position obtained in step 1, and extracts MOV

  4. Instruction extraction Unit (IFU) passes MOV instructions to decoder (ILD)

  5. ILD predecodes the MOV instruction, verifies the instruction length, etc. If the instruction length exceeds the specified, other processing will be done. Ps: THIS will be explained in detail in other articles in the future.

  6. The DECOder (ILD) will be MOV Instruction passed to the Instruction Queue (Instruction Queue) for buffer, PS: when the Instruction is very large, the Instruction Queue can collect a batch of instructions, in a clock cycle batch of this batch of instructions down, a batch of 4 instructions

  7. The Instruction queue passes the MOV Instruction to the Instruction Decoders for decoding. The Instruction Decoders contain 4 Decoders: One Complex Deocder is used to parse a Complex instruction into 1 ~ 4 microinstructions (excluding 1), and the other three Simple decoders are used to parse a Simple instruction into a microinstruction, which is generally abbreviated as UOP. The MOV instruction in the above memcmp function means to read the two input parameter addresses for comparison from memory and write the addresses to two RAX registers respectively, which is a complex instruction. Therefore, this MOV instruction is resolved into two microinstructions UOPS: read the input parameter address UOP1 from memory and write the address to THE RAX register UOP2

  8. The instruction parser passes the decomposed two UOPS to the instruction decoding queue (IDQ) for instruction deduplication

    (1) The instruction decoding queue (IDQ) passes the two UOPS to the loop detector (LSD) in turn. The loop detector checks whether there is a loop statement like while in UOP. If there is a loop statement, the repeated UOP in the loop is deduplicated. Since there is no loop in UOP1 and UOP2, the loop detector directly returns UOP1 and UOP2 to the IDQ.

    (2) The microinstruction sequence number generator generates two sequence numbers for UOP1 and UOP2, passes the sequence numbers to the instruction decoding queue, and assigns them to UOP1 and UOP2

  9. The instruction decoding queue passes UOP1 and UOP2 to MicroOps Fusion for microinstruction aggregation. For some instructions the same, here is aggregated into a microinstruction + instruction number. Uop1 and UOP2 are different. Therefore, no aggregation is performed.

  10. At this point, the CPU completes the pre-execution of the MOV instruction, which is commonly referred to as the Front End. Fusion then passes UOP1 and UOP2 to Allocator for microinstruction execution unit allocation

    (1) The Dispatcher respectively writes microinstructions UOP1 and UOP2 into the reorder buffer (ROB) and waits for the instruction execution results.

    (2) The Dispatcher simultaneously writes microinstructions to the repeater (RS). For UOP1 and UOP2, this write is different:

    A. Uop1 is a memory read operation, so uOP1 written to the reorder buffer is a complete microinstruction: UOP1 ADDR1, ADDR1 is the memory address read. Since UOP1 does not depend on the front data, the Dispatcher writes the entire UOP1 instruction to the repeater (RS) at the same time for execution.

    B. Uop2 is a write register operation. The Dispatcher will not write the entire uOP2 instruction to the repeater (RS).

  11. The repeater analyzes which microinstructions have dependencies and which do not, serial execution with dependencies, and parallel execution without dependencies. Because the current repeator only contains UOP1, only the execution unit is allocated to UOP1, that is, through the port2 port, the complete instruction of UOP1 is passed to the AGU Load execution unit, and uOP1 is executed. That is, the execution unit reads the address ADDR1 from the memory sort buffer (MOB). If it does not exist, The ADDR1 can only be read from memory through the address bus. If the ADDR1 is not read from the L2 Cache, the ADDR1 can only be read from memory through the address bus

  12. When the AGU Load execution unit successfully reads ADDR1, sends COMPLETE status and ADDR1 to the reorder buffer (ROB).

  13. Reorder the buffer to write uOP1 execution result (state) and ADDR1 to the fallback register file (RRF)

  14. Reorder buffer remove microinstruction UOP1

  15. At this point, the reorder buffer has ADDR1 and UOP2 written in Step 10. Then, the reorder buffer passes UOP2 and ADDR1 to the Dispatcher, who names a RAX register from the register name table and constructs the complete UOP2 instruction. Uop2 RAx, ADDR1, means to write ADDR1 to the RAX register, and then write the full instruction to the repeater (RS), ready to execute UOP2.

  16. The repeator assigns the AGU Store Address execution unit to execute UOP2 through port 3 and writes ADDR1 to the RAX register

  17. After the execution unit executes successfully, the resulting state, COMPLETE, is written to the reorder buffer

  18. Reorder the buffer to write the uOP2 execution result (UOP2, RAX) to the fallback register file (RRF), recording the value ADDR1 in the RAX register

  19. Reorder buffer remove microinstruction UOP2

After the steps above, you should know how an instruction is processed in the CPU. So, back to the question above: why does the memcmp function compare the two input addresses four times in 32 bytes?

  1. As can be seen from the figure above, the L2 Data Cache on the right is connected to the L1 Data Cache at the bottom for Data transmission. The transmission bandwidth is 256bit, that is, a maximum of 32 bytes of Data can be transmitted ata time. Therefore, the input parameter address is executed in 32 bytesuop1Microinstruction that makes full use of the bandwidth between L2 Data Cache and L1 Data Cache.
  2. Combined with the above step 11, since each clock cycle of the CPU can simultaneously select and execute four microinstructions that can be executed concurrently from the repeater (RS), the same comparison instruction CMP is split four times, and the four parts of the compared input parameter address are concurrently compared, and the comparison results are summarized at last. In this way, the comparison task that was executed four times in serial becomes a single execution in parallel, and performance is greatly improved.

If the length of the order field used for comparison exceeds 32 bytes, and the value of the field is not in the L1 Cache, the CPU has to write the value of the field to the L1 Cache multiple times, which affects the performance. It is recommended that the size of the sort field not exceed 32 bytes.

summary

Through the explanation of the contents of this chapter, we know some methods of sorting optimization:

  1. Add sort field to index to achieve overwrite index and avoid sort
  2. The value size of SELECT field + sort field in SQL is less than or equal to parametermax_length_for_sort_dataTo improve performance, avoid back table query
  3. The size of the sort field should not exceed 32 bytes to make full use of the 64-bit CPU to improve the sort performance

To consider

About the SQL in the Override index:

SELECT * FROM user WHERE user_name LIKE "%am%" AND age > = 18 AND age < = 24 AND sex = 0 ORDER BY age, user_name LIMIT 0.50
Copy the code

In this chapter, we use quicksort to sort the age and username. Is there a better way to improve the sorting performance?