In your daily development work, you will often encounter the need to sort by specified fields.

At this point, your SQL statement looks something like this.

select id,phone,code from evt_sms where phone like '13020%' order by id desc limit 10
Copy the code

The logic of this SQL is very clear, but its internal execution principle you know how much.

Next, this article will take you through the doors of Order BY.

All conclusions for this issue are based on MySQL8.0.26

The latest article

Did you know that strings can be indexed like this? MySQL series 7

MySQL > select * from ‘slow’ SQL

What? MySQL > delete data by delete

Count (*) = count(*) = count(*)

General Contents

First, several common Extra information

If you want to see the performance of an SQL query in MySQL, you should not only see whether the index is used, but also see the content in Extra. The following content is from the official document, which gives you the most accurate learning materials.

using index

Column information can be retrieved directly from the index tree without additional operations to read the actual rows.

Index columns are both query columns and condition columns.

using index condition

The following statement name is a normal index, age has no index.

select * from table where name = ? and age = ?

Index push-downs occur in MySQL5.6 and later versions.

The previous query process was to get the data in the storage engine based on name, and then filter it at the server layer based on age.

With index push-down, the query process is to obtain data from the storage engine based on name and age and return the corresponding data, without filtering the data from the server layer.

If a using index condition occurs when you analyze SQL statements using Explain, you are using index push-downs. Push-downs are most likely to occur when combining indexes.

using index for group_by

Only index columns are checked, and group by is used for index columns

explain select phone from evt_sms where phone = "13054125874" group by phone;
Copy the code

using where

Select * from Extra; select * from Extra; select * from Extra; Using index, which means that the qualified data cannot be queried through index lookup directly

The columns of the query are covered by the index, and the WHERE filter condition is a range of the leading columns of the index column, which also means that the data that matches the condition cannot be queried directly through the index lookup

zero limit

Select * from SQL where limit 0 is set

using temporary

Temporary tables are used, which are commonly encountered when using group by and Order BY.

This is also the topic of this article.

using filesort

This is commonly encountered when using group by and order BY, and the sorting process is done in memory

Backward index scan

A descending operation is used on index columns

Here are just a few of the most common information. There are about 37 parsing of Extra in MySQL official documents. If you are interested, you can check it out.

Second, file sorting

Because it is in some statistics and sorting business, it is common to see the information of using filesort in Extra.

Sorting a column without an index in MySQL8.0.26 shows using filesort in Extra. In the lower version you need to experiment with what will happen.

Using filesort, shown in Extra, represents sorting. MySQL allocates a block of memory to each thread for sorting, also known as sort_buffer. There will be a lot of nouns involved in this and the next article, so make sure you sort them out!

Now look at this statement

So what is the flow of this SQL execution?

1. Initialize sort_buffer and put fields phone and code into it

2. Find the primary key in the index tree of phone

3. Retrieve the corresponding field values of phone and code in the primary key index tree according to the primary key value, and then store them in SORt_buffer

4. Continue to fetch a primary key from phone

5. Repeat third and fourth until phone = condition is not satisfied

6. The data in sort_buffer is quicksorted according to the field phone

7. Take out the first 10 lines according to the results of fast sorting and return to the client

Question: Is all sorting done in memory?

Of course not, no memory is unlimited, and sorting in memory depends on the MySQL parameter sort_buffer_sort.

This value defaults to 256KB in MySQL8.0.26.

When the amount of data to be sorted exceeds the threshold of 256KB, temporary files are used for auxiliary sorting, which is commonly known as merge sort algorithm.

Sort_buffer_size is proportional to the number of temporary files required, and the smaller the sort_buffer_size is, the more temporary files are required.

How to check whether a sort uses temporary files is up to you to implement the answer, inconsistency can cause many different results.

Question: Do you know how merge sort is implemented?

Now that you know that temporary file sort is used if the sort is larger than sort_buffer_size, which uses the idea of merge sort, let’s see how the process works.

1. Divide the data to be sorted into pieces that can be stored in sort_buufer

2. Sort each piece of data in sort_buufer. After sorting, write to a temporary file

3. After all the data is written to the temporary file, it is in order for each temporary file, but is out of order for all temporary files, so it needs to merge the data

4, Suppose there are two temporary files tMP1 and tmp2, then read some data from TMP1 and Tmp2 respectively into memory

5. Suppose we read [0-5] data from TMP1 and TMP2 respectively, and then compare tmp1[0], tmp2[0], all the way to TMP1 [5], tMP2 [5], so that tMP1 and TMP2 can be combined into a single file. After a few rounds all the partitioned data is merged into one large, ordered file

Third, file sorting is very slow, is there any other way

In the above example, if the amount of data sorted is very large and exceeds the maximum sort_buffer_size, file sort is the only option, which involves multiple file merges and can be very performance consuming.

Did you notice a detail above? SQL only needs to sort code field, but added phone field to sort_buufer as well.

In this way, the size of a single row of data will increase virtually, so that the number of rows can be stored in memory will be reduced, need to be divided into multiple temporary files, sorting performance will be poor, so is there any other solution to solve this problem?

And the answer is yes, which is rowid sorting.

Let’s start with a parameter max_LENGTH_FOR_sort_data

The default max_LENGTH_FOR_sort_data size is 4096 bytes. Assuming that there is a lot of data to sort now, we can modify this parameter to use Rowid’s algorithm.

A parameter in MySQL that controls the length of a row sorted by the user. If the length of a row exceeds this value, MySQL automatically changes to roWID.

The idea of Rowid sorting is to remove unnecessary data from sort_buufer and let sort_buffer store only the fields that need sorting.

Question: What fields would you store if you were a designer

Suppose you now store the fields that only need to be sorted, and the sorting is done quickly. What do you do when you get the sorted data? You don’t know where to start.

Therefore, you can also store the primary key ID value in sort_buufer, and when the sorting is complete, you can retrieve the sorted data by returning the ID to the table.

Execute the process

If you think about it, this execution process is not very different from the file sorting process.

Only the fields stored in sort_buufer become the fields to be sorted plus the primary key fields.

It then sorts by sort field in sort_buufer

Finally, iterate through the sorting result, take the number of rows needed, and use id to return to the table once, you can find the column you need.

Pay attention to the point

This is not to say that using Rowid’s sorting algorithm will stop using temporary file sorting, it is not.

Using ROwid only stores more data in sort_buffer. If you need to sort a lot of data, you still need to use temporary files.

Optimize file sorting

If MySQL finds that sort_buufer memory is too small, the sorting efficiency will be affected, so roWId sorting algorithm will be used. The advantage of using ROWID algorithm is that sort_buffer can sort more rows at one time, but the disadvantage is that it needs to return to the table.

In MySQL, if there is enough memory, use more memory to minimize disk access. All of roWId’s algorithms will not be preferred, because the back table will cause past disk reads.

Not all order by statements require a sort operation. The two sorting algorithms analyzed above are derived from the fact that the original data is unordered.

Question: What is ordered?

After reading the index issue, you should know two things by now.

The index itself is sequential, so that when a range query is performed, the retrieved data is already sorted, avoiding the problems of the server reordering and creating temporary tables.

The underlying implementation of indexing is itself sequential, with disk prefetch allowing access to data on disk to be addressed roughly sequentially, that is, converting random I/ OS into sequential I/ OS.

Question: How do I prevent sorting

You should now know the answer, which is to create a federated index for the columns that need to be sorted.

Now create a joint index for phone and code

alter table evt_sms add index idx_phone_code (phone,code);
Copy the code

So the same statement is executed without the sort operation, so let’s look at the execution flow

Execute the process

1. Select the value of phone=’123456′ from the index (phone,code) and return the value of phone and code as part of the result set

3. Fetch the next record from the index (phone, code), also fetch the values of phone and code, and return them directly as part of the result set

4. Repeat Step 2 until 1000 rows of data are found or the query conditions are not met

Five, the summary

Using filesort appears in the execution plan when an index is not used for order BY

Using filesort Determines the use of temporary files based on the value of the sort_buffer_size parameter

The max_LENGTH_FOR_SORt_data parameter determines whether to use the ROWID algorithm, which is used if each row of data in the SORt_buffer is larger than the set value

By now you should know that roWId sort just puts the fields to be sorted and the primary key IDS into sort_buffer, whereas file sort puts all the fields of the query into sort_buffer.

Also, rowid creates one more callback operation, which you should also know.

The order by statement is optimized to create an overwrite index that returns the result directly without sorting.

This is not to advocate blindly building in a real production environment, but it is very fast to sort in memory if the data is very small, depending on the business situation. Overwriting an index takes up more storage and maintenance overhead.

Insist on learning, insist on writing, insist on sharing is the belief that Kaka has been upholding since he started his career. May the article in the big Internet can give you a little help, I am kaka, see you next time.