01 preface

I just changed my new job. It took me two weeks to prepare for it, and I got five offers in three days. Finally, I chose the offer from a unicorn in the Internet industry in Guangzhou, and I just entered yesterday. These days, I have just sorted out the interesting questions I have been asked in the interview, and I would like to take this opportunity to share with you.

The interviewer for this company was interesting. On the one hand, he was a young man of his age, and we chatted for two hours (until my mouth was dry). The second interview was a video interview with an architect from Ali. After I introduced myself, he began by asking me:

Are you familiar with MySQL?

It took me a second to realize that it was a pit. He must have wanted to ask me about the principle of something, and it just so happened that I had studied indexes. Answer:

Familiar with indexes.

He:

How does ORDER BY implement sorting?

Fortunately I reviewed again, basically sort buffer, how to optimize and so on all answer to the point. I will also play a plate of ORDER BY with you today. I will talk about ORDER BY from the principle to the final optimization. I hope it will be helpful to you.

International practice, mind map first. PS: There are benefits at the end of the article

.png)

1.1 The past is wonderful

How do MySQL query statements execute?

MySQL index

MySQL logs

MySQL transactions with MVCC

MySQL locking mechanism

MySQL > create an index for a MySQL string

Interviewer: What happens when you run out of database increments?

1.2 Take a chestnut first

We now have an order table that looks like this:

CREATE TABLE 'ORDER' (id INT (11) NOT NULL AUTO_INCREMENT CREATE TABLE 'ORDER' (id INT (11) NOT NULL AUTO_INCREMENT User_code VARCHAR (16) NOT NULL COMMENT ', goods_name VARCHAR (64) NOT NULL COMMENT ', ORDER_DATE TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP COMMENT '0 ', CITY VARCHAR (16) DEFAULT NULL COMMENT' 0 ', Order_num INT (10) NOT NULL COMMENT 'order number ', PRIMARY KEY (' id ')) ENGINE = INNODB AUTO_INCREMENT = 100 DEFAULT CHARSET = UTF8 COMMENT = 'TABLE ';

Make some data:

// DROP PROCEDURE IF EXISTS proc_buildata; CREATE PROCEDURE proc_buildata ( IN loop_times INT ) BEGIN DECLARE var INT DEFAULT 0; WHILE var < loop_times DO SET var = var + 1; INSERT INTO `order` ( `id`, `user_code`, `goods_name`, `order_date`, `city` , `order_num`) VALUES ( var, var + 1, 'wired earphone ', '2021-06-20 16:46:00',' hangzhou ', 1); END WHILE; END // delimiter; // Step 2: Call the function above to insert the data. It is recommended that you create some random data. For example, change the city and order number CALL proc_buildata(4000);

The data I generated looks like this:

Current demand: find out the order quantity and user number of the partner in Guangzhou during 618, and in ascending order according to the order quantity, only 1000 pieces.

According to the requirements can get the following SQL, I believe that partners are very familiar with.

SELECT city, order_num, user_code FROM 'order' WHERE city=' Guangzhou 'ORDER BY order_num LIMIT 1000;

So how does this statement work? Are there any parameters that can affect its behavior?

02 Full field sorting

My first reaction to this requirement is to index the city field and avoid a full table scan:

ALTER TABLE `order` ADD INDEX city_index ( `city` );

Use Explain to see how it works

Notice that the result of the last Extra field is: Using Filesort, which means sort is required. In fact, MySQL allocates a chunk of memory per thread for sorting, called sort_buffer.

In order to have a more intuitive understanding of the sorting process, I have drawn a rough diagram of the city index:

You can see that the data from ID-3 to ID -x satisfies the SQL condition. The overall flow of SQL looks like this:

  • 1. Initialize sort_buffer into three fields: city, order_num, user_code;
  • Select id from index city where id =’ guangzhou ‘; select id from index city where id =’ guangzhou ‘; select id from index city where id =’ guangzhou ‘;
  • 3. Select the whole row from the primary key id index and the values of city, order_num, user_code, and store them in sort_buffer;
  • Select * from index city where id = primary key;
  • 5. Repeat steps 3 and 4 until the value of city does not meet the query conditions, and the corresponding primary key id is ID_X in the figure;
  • Select * from sort_buffer where order_num = order_num = order_num;
  • 7. Select the first 1000 rows according to the sorting result and return them to the client.

This process is called full-field sorting. Let me draw a picture that looks like this:

Here, sorting by ORDER_NUM may be done in memory or may require an external sort, depending on the memory required for the sort and the sort_buffer_size parameter.

This is the size of the sort_buffer allocated by MySQL for sorting. If the amount of data to be sorted is less than sort_buffer_size, the sorting is done in memory. However, if the amount of sorting data is too large, memory can not support, it is necessary to disk temporary files to assist sorting.

Of course, in MySQL5.7 and above, you can use the following detection methods (useful later) to see if a sort statement is using a temporary file. PS: The statement here can be copied directly to Navicat for execution, to execute together (copy all in, click down to execute)

SET optimizer_trace='enabled=on'; SET optimizer_trace='enabled=on'; /* @a save the initial value of Innodb_rows_read */ select VARIABLE_VALUE into @a from performance_schema.session_status where variable_name = 'Innodb_rows_read'; SELECT city, order_num, user_code FROM 'order' WHERE city=' Guangzhou 'ORDER BY order_num LIMIT 1000; */ SELECT * FROM 'information_schema'. 'OPTIMIZER_TRACE'; /* @b save the current value of Innodb_rows_read */ select VARIABLE_VALUE into @b from performance_schema.session_status where variable_name = 'Innodb_rows_read'; /* Calculate Innodb_rows_read difference */ select @b-@a;

After execution, you can get the following results from the TRACE field of the OPTIMIZER_TRACE table:

Where examined_rows represents the number of rows that need to be sorted 6883; Sort_buffer_size is the size of the sort buffer; Sort_buffer_size is the size of my MySQL sort buffer, 256 KB.

In addition, the sort_mode value is packed_additional_fields, which indicates that the sorting process optimizes the data, meaning that the data takes up as much memory as possible. For example: there is no data defined as length 16, so it will be counted as length 16. If the data size is only 2, only memory will be allocated at length 2.

Number_of_tmp_files represents the use of several external files to assist sorting. I’m using two here, and if I can’t store it inside, I’m going to use external sort, and external sort usually uses merge sort algorithm. As simple as this, MySQL splits the sorted data into two pieces, each of which is separately sorted and stored in these temporary files. Then merge these two ordered files into one large ordered file.

(select @b-@a from 6884) (select @b-@a from 6883) (select @b-@a from 6884) (select @b-@a from 6883)

A temporary table is needed to query the OPTIMIZER_TRACE table. When the INNDB engine pulls data out of the temporary table, the INNDB_ROWS_READ value increases by 1.

So setting INTERNAL_TMP_DISK_STORAGE_ENGINE to MYISAM will solve this problem.

03 rowid sorting

The above full field sort is actually a big problem, as you may have noticed. All the fields we need to query should be placed in the sort_buffer. If the number of fields we need to query increases, it will be easy to fill the sort_buffer.

At this point, you have to use a lot of temporary files to assist in sorting, resulting in poor performance.

So here’s the question:

So the way to think about it is to reduce the length of a single row of sort, is there any way to do that?

The reason why MySQL uses full field sort is controlled by max_length_for_sort_data. The default value is 1024.

show variables like 'max_length_for_sort_data';

Because in this example, city,order_num,user_code length = 16+4+16 =36 < 1024, we use full field sort. Let’s change this parameter, make it a little bit smaller,

SET max_length_for_sort_data = 16;

When the length of a single row exceeds this value, MySQL considers a single row to be too large and requires a different algorithm. The size of city, user_code, and order_num is 36, so there is no room for all the query fields. SORT_BUFFER only stores ORDER_NUM and ID fields.

The process should look like this:

  • Sort_buffer = order_num; sort_buffer = order_num; sort_buffer = order_num;
  • Select id from index city where id =’ guangzhou ‘; select id from index city where id =’ guangzhou ‘; select id from index city where id =’ guangzhou ‘;
  • Select order_num, id, order_num, order_num, order_num, order_num, order_num, order_num, order_num, order_num, order_num, order_num, order_num, order_num
  • Select * from index city where id = primary key;
  • 5. Repeat steps 3 and 4 until the condition of city=’ Guangzhou ‘is not met, i.e. ID_X in the figure;
  • Sorting data from SORT_BUFFER by field ORDER_NUM
  • 7. Go through the sorting result, take the first 1000 rows, go back to the table again, get the three fields of city, order_num and user_code and return them to the client.

As you can see from the figure, this method actually has one more return operation, but the sort_buffer_size consumption is smaller.

At this point, execute the above detection method and find that the information in the OPTIMIZER_TRACE table has changed.

  • SORT_MODE becomes < SORT_KEY, ROWID >, indicating that the only fields involved in the sort are ORDER_NUM and ID.
  • Number_of_tmp_files = 0 because the number of rows involved in sorting is still 6883, but each row is smaller, so the number of data needed to sort is smaller. Sort_buffer_size is sufficient for sorting memory, so temporary files are not needed.

The value of EXAMINED_ROWS is still 6883, indicating that the data used for sorting is 6883 rows. But the value of the SELECT @b-@a statement becomes 7884. Because at this point in addition to sorting, after sorting is done, you have to go back to the table once. Since the statement is LIMIT 1000, an extra 1000 lines are read.

3.1 Summarize

In a ROWID sort, the sorting process can sort more rows at once, but needs to go back to the table to fetch data.

If memory is large enough, MySQL will select all fields to be sorted into the sort_buffer, so that the query results will be returned directly from memory instead of back to the table.

This is part of MySQL’s design philosophy: if you have enough memory, use it as much as possible and minimize disk access.

For InnoDB tables, ROWID sorting will require the back table to cause more disk reads, so it will not be preferred.

In both cases, because the data itself is unordered, you need to put it into sort_buffer and generate a temporary file to do the sorting.

Is there any way to make the data itself orderly? Recall that we learned that indexes are ordered.

04 Index Optimization

In this case, if I create a combined index of city and order_num, will the result be naturally ordered data? Such as:

alter table `order` add index city_order_num_index(city, order_num);

At this point, the index of the ORDER table looks like this:

The SQL execution statement at the beginning of the article. The execution process looks like this:

  • Select * from index (city,order_num) where id of primary key meets condition (city =’ guangzhou ‘);
  • 2. Go back to the table, take the values of city, order_num, user_code, and return them as part of the result set;
  • Select id from index (city,order_num); select id from index (city,order_num);
  • 4. Repeat steps 2 and 3 until the 1000 record is found, or when the condition of city=’ Guangzhou ‘is not met, the cycle ends.

This procedure does not require sorting, let alone temporary tables. Return to table only once:

You can see from the figure that there is no Using Filesort in the Extra field, so sorting is not required. And since the federated index (city,order_num) itself is ordered, you can exit once you find the first 1000 records that meet the criteria and return to the table again. In other words, it only takes 2,000 scans.

So the question is, is there a better solution?

05 Ultimate Optimization

In the above method, there is still a return to the table, mainly because user_code is not included in the index. To recap the SQL optimization we learned earlier, how to avoid going back to the table?

SELECT * FROM ‘user_code’ WHERE user_code = ‘user_code’; SELECT * FROM ‘user_code’ WHERE user_code = ‘code’;

alter table `order` add index city_order_num_user_code_index(city, order_num, user_code);

At this point, the process looks like this. You can simply fetch the data:

Explain the implementation:

From the figure, we can see that Using index is used in the Extra field. You don’t even need to go back to the table, you just need to scan it 1,000 times.

Perfect ~

5.1 Parameter tuning

In addition, you can optimize the execution of ORDER BY by tuning the parameters. For example, adjust the sort_buffer_size as large as possible, because the sort_buffer is too small, and the amount of sorting data is large, it will use the temporary file on the disk to sort. If the MySQL server configuration is high, you can slightly adjust it to be larger.

Another example is to increase the value of max_length_for_sort_data. If this value is too small, it increases the number of table returns and degrades query performance.

06 order by

1. Is there a sorting process when the query statement has in attributes?

Assuming you now have a federated index (city,order_num,user_code), execute the following SQL statement:

SELECT city, order_num, user_code FROM 'order' WHERE city in (' Guangzhou ') ORDER BY order_num LIMIT 1000

In a single condition, of course, doesn’t need to be sorted. Explain:

However, in more than one condition; There is a sorting process, such as executing the following statement

SELECT city, order_num, user_code FROM 'order' WHERE city in (' Guangzhou ',' Shenzhen ') ORDER BY order_num LIMIT 1000

Explain the following, and see the last Using filesort to indicate the sorting process. Why is that?

Because order_num is inherently a composite index, it is ordered if there is only one condition for “city= guangzhou “. It is also ordered when “city= shenzhen “is satisfied. But the two together do not guarantee that order_num is still ordered.

2, Paging limit is too large, resulting in a large number of sorts. Do how?

SELECT * FROM 'user' ORDER BY AGE LIMIT 100000,10
  • SELECT * FROM ‘>’ WHERE id LIMIT 10 LIMIT 10 SELECT * FROM ‘>’ WHERE id LIMIT 10 LIMIT 10
  • You can also limit the number of pages if the business permits.

3, The index storage order is not consistent with ORDER BY, how to optimize?

If there is a joint index (age,name), we need to modify it as follows: query the names and ages of the top 10 students, and sort them in order of the smallest to the largest age; if they are of the same age, they will be sorted in descending order of name. The corresponding SQL statement should be:

select name, age from student order by age, name desc limit 10;

Explain the value of “Extra” Using Filesort:

This is because the (age,name) index tree is sorted by age from smallest to largest, and if the age is the same, sorted by name from smallest to largest. In order by, order by age from smallest to largest. If the age is the same, order by name from largest to smallest. That is, the index storage order is not the same as ORDER BY.

How do we optimize that? Descending Indexes If MySQL is running on version 8.0 and supports Descending Indexes, you can modify Indexes like this:

CREATE TABLE 'student' (' id' bigint(11) NOT NULL AUTO_INCREMENT CREATE TABLE 'student' (' id' bigint(11) NOT NULL AUTO_INCREMENT 'student_id' varchar(20) NOT NULL, 'name' varchar(64) NOT NULL, 'name' varchar(64) NOT NULL, 'age' int(4) NOT NULL, 'city' varchar(64) NOT NULL, PRIMARY KEY (' id '), KEY 'idx_age_name' (' age ', 'name' desc) USING BTREE) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT=' student table ';

SELECT * FROM ‘ORDER BY’ WHERE ‘ORDER BY’

In daily development, you may encounter an ORDER BY without a WHERE condition. In this case, do you need to index the fields following the ORDER BY? SELECT * FROM ‘create_time’ WHERE ‘create_time’ = ‘indexed’;

select * from student order by create_time;

Unconditional queries, even if there is an index on create_time, will not be used. Because the MySQL optimizer thinks that going to a normal secondary index and then going back to the table is more expensive than a full table scan sort. So select a full table scan, and then do it by full field sort or rowid sort.

SQL > select * from SQL > select * from SQL >

select * from student order by create_time limit m;

Unconditional queries, if the m value is small, can be indexed. Because the MySQL optimizer believes that the cycle can be terminated by going back to the table to look up the data according to the index ordering, and then obtaining M pieces of data, then the cost is lower than the full table scan, so the secondary index is chosen.

07 summary

In this article, we talked about the execution process of ORDER BY, and the difference between full field sorting and rowid sorting. We learned that MySQL prefers to trade memory for performance improvements.

At the same time, we can also reduce the number of times we go back to the table by using the index overlay trick of combining indexes. When designing the index in the future, if the business has fields involving sorting, try to add them to the index, and add the remaining query fields in the business (such as city and user_code in the paper) to the composite index, so as to achieve better index coverage.

Indexes have their drawbacks, of course. It takes up space, it has maintenance costs. So we still need to design according to their own actual business to consider.

Finally, I have discussed with you four classic interview questions about ORDER BY, I hope it is helpful for you.

7.1 the reference

  • blog.csdn.net/weixin_28917279/article/details/113424610
  • https://time.geekbang.org/col…
  • https://zhuanlan.zhihu.com/p/…

08 idea activation

JetBrains family bucket activation

09 Dachang Interview Questions & E-books

If you like this article, please help to have a look at it.

I don’t know what to send you when I first meet you. Just send hundreds of eBooks and the latest interview materials for 2021. WeChat search JavaFish reply ebook to send you 1000+ programming ebook; Send some interview questions in reply to the interview; 1024 sends you a complete set of Java video tutorials.

The interview questions are answered, and the details are as follows: If you need it, come and get it. It’s absolutely free.