01 preface

I just changed a new job and spent two weeks preparing for it. I got five offers within three days and finally chose the offer of a unicorn in the Internet industry in Guangzhou. I just started my job yesterday. These days I just sort out the interesting questions I was asked in the interview and take this opportunity to share them with you.

The interviewer of this company was interesting, and he was a young man of the same age, and we talked for two hours (until my mouth was dry). The second was a video interview with an architect from Ali. After I introduced myself, he began by asking me:

Are you familiar with MySQL?

I was stunned and then realized it was a pit. I’m sure he wanted to ask me about something, and I happened to study the index. Answer:

Familiar with indexes.

He:

How to sort order by?

Fortunately I reviewed again, basically sort buffer, how to optimize and so on all the answers to the point. Today, I will also make a video of Order BY with you. I will talk about order BY from the principle to the final optimization. I hope it will be helpful to you.

International convention. Mind map first.

1.1 For example

Now there is an order form, structured like this:

CREATE TABLE `order` (
id INT ( 11 ) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
user_code VARCHAR ( 16 ) NOT NULL COMMENT 'User number',
goods_name VARCHAR ( 64 ) NOT NULL COMMENT 'Trade Name',
order_date TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Order time',
city VARCHAR ( 16 ) DEFAULT NULL COMMENT 'Order City',
order_num INT ( 10 ) NOT NULL COMMENT 'Order Number quantity'.PRIMARY KEY ( `id` ) 
) ENGINE = INNODB AUTO_INCREMENT = 100 DEFAULT CHARSET = utf8 COMMENT = 'Merchandise Order Sheet';
Copy the code

Build point data:

//Step 1: Create the function delimiter//

DROP PROCEDURE
IF
	EXISTS proc_buildata;
CREATE PROCEDURE proc_buildata ( IN loop_times INT ) BEGIN
DECLARE var INT DEFAULT 0;
WHILE
	var < loop_times DO
	
	SET var = var + 1;
INSERT INTO `order` ( `id`, `user_code`, `goods_name`, `order_date`, `city` , `order_num`)
VALUES
	( var, var + 1.'Wired headphones'.'the 2021-06-20 16:46:00'.'hangzhou'.1 );

END WHILE;

END // delimiter;

//Step 2: call the function generated above, you can insert data, we suggest that we create some random data. Such as changing cities and order numbersCALL proc_buildata(4000);
Copy the code

The data I generated looks like this:

Existing requirements: find out the order quantity and user number of partners in Guangzhou during 618, and increase the order quantity according to the order quantity, only 1000.

According to the requirements, the following SQL can be obtained, I believe that partners are familiar with it.

select city, order_num, user_code from `order` where city='guangzhou' order by order_num limit 1000;
Copy the code

So how does this statement work? Are there any parameters that affect its behavior?

02 Sort all fields

My first response to this requirement is to add an index to the city field to avoid a full table scan:

ALTER TABLE `order` ADD INDEX city_index ( `city` );
Copy the code

Use Explain to see the performance

Note that the last extra field results in: Using filesort, indicating that sort is required. MySQL allocates each thread a block of memory for sorting, called sort_buffer.

To get a better idea of how sorting works, I’ve drawn a rough representation of the city index:

SQL > select * from ID-3 to ID-X. The entire flow of SQL looks like this:

  • 1. Initialize sort_buffer and add city, order_num, user_code;
  • Select first primary key id (ID_3) from city where city=’ guangzhou ‘;
  • Select the value of city, order_num, user_code and save the whole row into sort_buffer.
  • Select * from city; select * from city;
  • 5. Repeat steps 3 and 4 until the value of city does not meet the query conditions, and the corresponding primary key ID is ID_X in the figure.
  • 6. Quicksort the data in sort_buffer according to order_num;
  • 7. Return the first 1000 rows according to the sorting result to the client.

This process is called full field sorting. Draw a graph that looks like this:

The sorting by order_num step may be done in memory or may require external sorting, depending on the amount of memory required and the parameter sort_buffer_SIZE.

This is how much memory MySQL allocates for sorting (sort_buffer). If the amount of data to sort is less than sort_BUFFer_size, the sort is done in memory. But if the amount of sorting data is too large, memory can not hold, disk temporary files to assist sorting.

Of course, in MySQL5.7 and above you can use the following detection methods (useful later) to see if a sort statement uses temporary files. PS: Copy the statement directly to Navicat and execute it at the same time.

/* Open optimizer_trace, valid only for this thread */
SET optimizer_trace='enabled=on'; 

/* @a saves the initial value of Innodb_rows_read */
select VARIABLE_VALUE into @a from  performance_schema.session_status where variable_name = 'Innodb_rows_read';

/* Execute the statement */
select city, order_num, user_code from `order` where city='guangzhou' order by order_num limit 1000; 

/* View the OPTIMIZER_TRACE output */
SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`;

/* @b saves the current value of Innodb_rows_read */
select VARIABLE_VALUE into @b from performance_schema.session_status where variable_name = 'Innodb_rows_read';

/* Calculate the Innodb_rows_read difference */
select @b-@a;
Copy the code

After executing, you get the following result from the TRACE field of the OPTIMIZER_TRACE table:

Examined_rows specifies the number of rows to be sorted. Sort_buffer_size is the size of the sort buffer; Sort_buffer_size is my MySQL sort buffer size of 256 KB.

In addition, the sort_mode value is packed_additional_fields, which indicates that the sorting process is optimized for the data, i.e. the data takes up as much memory as possible. For example: there is no data definition of length 16, which is used. If the data is only 2, only memory is allocated at length 2.

Number_of_tmp_files indicates that several external files are used to assist sorting. I’m using two here, and when I can’t fit them in, I’m going to use external sort, and external sort usually uses merge sort. MySQL splits the data to be sorted into two separate files, each sorted separately and stored in these temporary files. Then merge the two ordered files into one ordered large file.

Select @b-@a = 6884; select @b-@a = 6883

A temporary table is required to query the OPTIMIZER_TRACE table. When the InnDB engine pulls data from a temporary table, the value of Inndb_rows_read increases by 1.

To solve this problem, set internal_tmp_disk_storage_engine to MyISAM.

03 rowid sorting

The full field sort above can be quite problematic, as you may have noticed. The sort_buffer will contain all the fields we need to query. If the number of fields we need to query increases, the sort_buffer will be filled easily.

In this case, many temporary files are used to assist sorting, resulting in performance degradation.

Here’s the question:

The way to think about it is to reduce the length of a single row. Is there a way to do that?

Max_length_for_sort_data is controlled by max_LENGTH_FOR_sort_data, which defaults to 1024.

show variables like 'max_length_for_sort_data';
Copy the code

In this example, the length of city,order_num,user_code = 16+4+16 =36 < 1024, so the order is full field. Let’s change this parameter, make it a little bit smaller,

SET max_length_for_sort_data = 16;
Copy the code

When the length of a single line exceeds this value, MySQL considers the single line to be too large and needs to change its algorithm. The length of the original city, user_code, order_num footprint is 36, obviously not put all the query field. Sort_buffer only stores order_NUM and ID fields.

The flow should look like this:

  • 1. Initialize sort_buffer and make sure to put two fields order_num and ID;
  • Select first primary key id (ID_3) from city where city=’ guangzhou ‘;
  • Select order_num and ID from sort_buffer;
  • Select * from city; select * from city;
  • 5. Repeat steps 3 and 4 until the condition city=’ Guangzhou ‘is not met, that is, ID_X in the figure;
  • 6. Sort the data in sort_buffer according to order_num;
  • 7. Iterate over the sorting result, fetch the first 1000 rows, and return the city, order_num, and user_code fields to the client.

As you can see from the graph, this method actually has one more back table operation, but the sort_buffer_size footprint is smaller.

At this point, execute the above detection method and you can see that the information in the OPTIMIZER_TRACE table has changed.

  • Sort_mode becomes

    , indicating that only order_num and ID are involved in sorting.
    ,>
  • Number_of_tmp_files becomes 0 because the number of rows involved in sorting is still 6883, but each row is smaller, so the total amount of data to sort is smaller. Sort_buffer_size is sufficient for sorting memory, so temporary files are not needed.

Examined_rows has the same value of 6883, meaning that 6883 rows are used to sort the data. But the select @b-@a statement is 7884. Because this time in addition to the sorting process, after the sorting is completed, but also back to the table once. Because the statement is LIMIT 1000, 1000 more lines will be read.

3.1 Make a summary

In ROWId sort, the sorting process can sort more rows at a time, but requires fetching data back from the table.

If the memory is large enough, MySQL will preferentially select all fields in sort_buffer, and then return the query result directly from memory without returning the table.

This reflects one of MySQL’s design philosophies: if you have enough memory, use more memory to minimize disk access.

For InnoDB tables, roWID sorting will require multiple disk reads from tables, so it will not be preferred.

In both cases, because the data itself is unordered, the sort_buffer needs to be sorted and temporary files are generated.

Is there a way to make the data itself orderly? Remember, we learned that indexes are ordered.

04 Index Optimization

If I create a combined index of city and order_num, is the result naturally ordered? Such as:

alter table `order` add index city_order_num_index(city, order_num);
Copy the code

The index of the order table looks like this:

The SQL execution statement at the beginning of this article. The execution process looks like this:

  • Select * from index (city,order_num) where city=’ guangzhou ‘;
  • Return city, order_num, user_code as part of the result set;
  • Select * from index (city,order_num);
  • 4. Repeat steps 2 and 3 until the 1000th record is found or the condition that city=’ Guangzhou ‘is not met ends the cycle.

Use Explain to see, this process does not need to sort, let alone temporary table. Only need to return the table once:

As you can see from the figure, there is no use of filesort in Extra. And since (city,order_num) is an ordered joint index, you can exit the table once you find the first 1000 entries that meet the condition. That’s 2,000 scans.

So the question is, is there a better solution?

05 Ultimate Optimization

The above method returns the table once, mainly because user_code is not included in the index. To review the SQL optimizations we learned earlier, how to avoid back to the table?

Select * from user_code where user_code = ‘user_code’;

alter table `order` add index city_order_num_user_code_index(city, order_num, user_code);
Copy the code

In this case, the process is like this, directly fetch data is done:

Explain the execution:

From the figure, we can see that the Extra field has more Using index, that is, the index overwrite is used. You don’t even need to go back to the table, just scan it 1000 times.

Perfect ~

5.1 Parameter Tuning

In addition, you can tune the parameters to optimize the execution of order BY. For example, adjust sort_buffer_size as large as possible, because sort_buffer is too small and sorting data is too large, it will be sorted by temporary files on disk. If the MySQL server configuration is high, it can be slightly larger.

For example, increase the value of max_LENGTH_FOR_sort_data. If the value is too small, the number of times to return to the table increases and the query performance deteriorates.

06 Order by

1, query statement has in multiple attributes, SQL execution is there a sorting process?

Given that there are now federated indexes (city,order_num,user_code), execute the following SQL statement:

select city, order_num, user_code from `order` where city in ('guangzhou') order by order_num limit 1000
Copy the code

In individual conditions, of course, do not need to be sorted. Explain:

However, when in has more than one condition; There will be sorting procedures, such as executing the following statement

select city, order_num, user_code from `order` where city in ('guangzhou'.'shenzhen') order by order_num limit 1000
Copy the code

Using filesort is used to explain the sorting process. Why is that?

Because order_num is a composite index, it is ordered if there is only one condition: “city= Guangzhou “. It is also ordered if “city= Shenzhen “is satisfied. But the two together do not guarantee that order_num is ordered.

2. The paging limit is too large, causing a lot of sorting. Do how?

select * from `user` order by age limit 100000.10
Copy the code
  • You can record the LAST ID of the previous page. When you query information on the next page, enter the ID in the query condition, for example, where ID > last ID of the previous page LIMIT 10.
  • You can also limit the number of pages if your business allows.

3, Index storage order is inconsistent with order BY, how to optimize?

Suppose there is a joint index (age,name), we need to change it to like this: query the name and age of the top 10 students, and sort by age from youngest to oldest, if the age is the same, then sort by name in descending order. The corresponding SQL statement should be:

select name, age from student order by age, name desc limit 10;
Copy the code

* * * extra * * * * * * * * * * *

This is because in the index tree (age,name), age is sorted from smallest to largest, and if age is the same,name is sorted from smallest to largest. In order by, the order is sorted by age from the smallest to the largest. If age is the same, the order is sorted by name from the largest to the smallest. In other words, the index is stored in a different order than order by.

How do we optimize? MySQL > alter TABLE Descending Indexes

CREATE TABLE `student` (
  `id` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key id',
  `student_id` varchar(20) NOT NULL COMMENT 'student id',
  `name` varchar(64) NOT NULL COMMENT 'name',
  `age` int(4) NOT NULL COMMENT 'age',
  `city` varchar(64) NOT NULL COMMENT 'city'.PRIMARY KEY (`id`),
  KEY `idx_age_name` (`age`,`name` desc) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=utf8 COMMENT='Student List';
Copy the code

4, Do you need to add index to order by field without WHERE condition

In daily development, there may be an order by with no where condition. Do you need to add an index to the field after order BY? SQL > alter table create_time;

select * from student order by create_time;
Copy the code

An unconditional query will not be used even if there is an index on create_time. This is because the MySQL optimizer considers it more expensive to go back to a normal secondary index than to scan a full table. So choose to go through the full table scan, and then according to the full field sort or RoWId sort.

If the query SQL is modified:

select * from student order by create_time limit m;
Copy the code

Unconditional query, if the value of m is small, it can go to the index. Because the MySQL optimizer believes that it can terminate the loop by going back to the table according to the index order and then getting m rows, the cost is less than the full table scan, so it chooses to go to the secondary index.

07 summary

This article tells you about the order by execution process and the difference between full field sorting and ROWID sorting. MySQL is more willing to trade memory for performance gains.

At the same time, we can reduce the number of times we return to the table by using the index coverage technique of composite indexes. In the future index design, if there are fields related to sorting, try to add them to the index, and add other query fields (such as city and user_code) in the service to the composite index to achieve better index coverage.

Of course, indexes have their drawbacks. It takes up space and has maintenance costs. So you still need to design according to their actual business to consider.

Finally, I also discussed four classic interview questions about order by with you, hoping to help you.

7.1 the reference

  • blog.csdn.net/weixin_28917279/article/details/113424610
  • Time.geekbang.org/column/arti…
  • zhuanlan.zhihu.com/p/380671457

08 Big factory interview questions & e-books