Version: 5.7, from 8.2.1.14 ORDER BY Optimization

This section describes when MySQL can use indexes to satisfy the ORDER BY clause, use filesort when indexes are not available, and information about the ORDER BY execution plan in the optimizer.

An ORDER BY statement may be executed differently with or without limit. For details, see 8.2.1.17 LIMIT Query Optimization.

Order BY is implemented using an index

In some cases, MySQL may use indexes to satisfy an ORDER BY clause and avoid the extra sorting involved when performing filesort operations.

Although the ORDER BY does not exactly match the index, the index is still used, as long as in the WHERE clause all unused portions of the index (in the case of multiple fields in an index – the union index) and all ORDER BY fields are constants, going to the index instead of filesort.

So here we have a table tx_order,

CREATE TABLE `tx_order` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT ,
  `serial_number` varchar(255) NOT NULL ,
  `order_status` int unsigned DEFAULT 0 NOT NULL ,
  `market_id` varchar(10) DEFAULT NULL ,
  `market_name` varchar(255) DEFAULT NULL ,
  `shop_id` varchar(50) DEFAULT NULL ,
  `shop_name` varchar(100) DEFAULT NULL ,
  `mobile` varchar(64) DEFAULT NULL ,
  `create_date` datetime DEFAULT NULL ,
  PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2333702 DEFAULT CHARSET=utf8;
Copy the code

And add an index

alter table tx_order add index idx_market_date(market_id,create_date);
Copy the code

Analyze the use of the order by index in the following SQL. Whether the MySQL optimizer actually performs SQL using an index or a table scan depends on the efficiency of both.

  • In the following SQL, the optimizer uses the idx_market_date index to avoid table scans.
desc select market_id,create_date from tx_order.tx_order order by market_id,create_date;

1	SIMPLE	tx_order		index		idx_market_date	39		1671956	100	Using index

Copy the code

However, the query fields in this SQL statement are in the index, if the query fields are not included in the index, such as “SELECT market_id,create_date,market_name”. In this case, scanning the entire index and looking up table rows for columns that are not in the index can be more expensive than a table scan, and the optimizer may not use the index.

desc select market_id,create_date,market_name from tx_order.tx_order order by market_id,create_date;

1	SIMPLE	tx_order		ALL					1671956	100	Using filesort

Copy the code

In InnoDB, we know that the primary key (clustered index) is itself part of the index, which is used in the following query.

desc select id,market_id,create_date from tx_order.tx_order order by market_id,create_date;

1	SIMPLE	tx_order		index		idx_market_date	39		1671956	100	Using index

Copy the code
  • In cases where a field in the index is a constant in the WHERE condition, and where substatements produce a range index that performs much better than a table scan, such a query will choose the index over the table scan.
desc select market_id,create_date from tx_order.tx_order where  market_id = '1009' order by create_date;

1	SIMPLE	tx_order		ref	idx_market_date	idx_market_date	33	const	170398	100	Using where; Using index

Copy the code
  • Order by… order by… Asc statement. Looking at the following results we can think about why. Alter table tx_ORDER add index idx_market_date(market_id asc,create_date desc); However, the current version of MySQL does not do any logic support, is a unified installation of the default ascending order. In a federated index, queries are sorted by the fields in the index, and if the sorting is inconsistent, the optimizer will still partially scan the table.
desc select market_id,create_date from tx_order.tx_order order by market_id desc ,create_date desc ;

1	SIMPLE	tx_order		index		idx_market_date	39		1671956	100	Using index

desc select market_id,create_date from tx_order.tx_order order by market_id asc ,create_date desc ;

1	SIMPLE	tx_order		index		idx_market_date	39		1671956	100	Using index; Using filesort

Copy the code
  • In the following query WHERE substatement, the range index is superior to the table scan, and the optimizer chooses index resolution order BY.
desc select market_id,create_date from tx_order.tx_order where market_id > '1009' order by market_id asc;

1	SIMPLE	tx_order		range	idx_market_date	idx_market_date	33		835978	100	Using where; Using index

desc select market_id,create_date from tx_order.tx_order where market_id < '1009' order by market_id desc;

1	SIMPLE	tx_order		range	idx_market_date	idx_market_date	33		230966	100	Using where; Using index
Copy the code
  • In the following query, order by is no longer “market_id”, but the row “market_id” in all queries is a constant, so the parse order BY of the index is still used.
desc select market_id,create_date from tx_order.tx_order where market_id = '1009' and create_date>'2018-01-01' order by create_date desc;

1	SIMPLE	tx_order		range	idx_market_date	idx_market_date	39		94002	100	Using where; Using index
Copy the code

In some cases, MySQL does not use an index to parse an order by, although it does use an index for where conditions, as shown in the following example.

  • MySQL > select idx_market_id(market_id,order_status,create_date) from idx_market_id(order_status,create_date);
desc select market_id,create_date from tx_order.tx_order where  market_id='1009' order by market_id ,create_date ;

1	SIMPLE	tx_order		ref	idx_market_id,idx_market_type_create_date	idx_market_id	33	const	138084	100	Using where; Using index; Using filesort
Copy the code
  • Mixed sort ASC, DESC
desc select market_id,create_date from tx_order.tx_order order by market_id asc ,create_date desc;

1	SIMPLE	tx_order		index		idx_market_date	39		1671956	100	Using index; Using filesort
Copy the code
  • The order BY field uses a function that the optimizer resolves to abandon the order by index
desc select mobile from tx_order.tx_order order by  abs(mobile);

1	SIMPLE	tx_order		index		idx_mobile	768		1671956	100	Using index; Using filesort

Copy the code
  • In a multi-table associative query, and the columns in ORDER BY are not all from the first extraordinary table used to search for rows. This is the first table in EXPLAIN output that does not have a const join type.
desc select a.market_id from tx_order.tx_order a ,tx_order_item b where a.id = b.order_id and a.market_id = '1009'order by a.market_id,b.sku; 1 SIMPLE b ALL idx_order_create 1 100 Using filesort 1 SIMPLE a eq_ref PRIMARY,idx_market_date PRIMARY 8 10.19 Using tx_order. B.o rder_id 1where

Copy the code
  • There are different ORDER BY and GROUP BY expressions.
desc select market_id,create_date from tx_order.tx_order   group by market_id,create_date order by create_date;

1	SIMPLE	tx_order		index	idx_market_date	idx_market_date	39		1671956	100	Using index; Using temporary; Using filesort

Copy the code
  • For indexes that specify the sort index length. In this case, the index cannot fully resolve the sort order, and you need to use filesort for sorting. Alter table TX_ORDER add index idx_mobile(mobile(5)); However, mobile varchar (64).
desc select mobile from tx_order.tx_order order by mobile desc ;

1	SIMPLE	tx_order		ALL					1671956	100	Using filesort

Copy the code
  • In some cases, the type of table index used cannot hold rows in order. This is the case, for example, for HASH indexes of the HEAP table.

  • The availability of sorted indexes can be affected by the use of column aliases.

In the following statement, sorting is affected and indexes are not used.

desc select abs(market_id) as aa from tx_order.tx_order order by aa;

1	SIMPLE	tx_order		index		idx_market_date	39		1671956	100	Using index; Using filesort

Copy the code

However, in the following statement, the query field is aliased, but the actual sort field is the field in the index, so the sort is still indexed.

desc select abs(market_id) as aa from tx_order.tx_order order by market_id;

1	SIMPLE	tx_order		index		idx_market_date	39		1671956	100	Using index

Copy the code

By default, for “group by col2,col2…” MySQL will also include “order by col2,col2…” Equivalent to the acceleration you show “Order by col2,col2…” Sort, in which case the optimizer handles with no performance penalty.

For this default, if you want to avoid the default sort, you can use order by NULL to avoid it, for example:

desc select market_id,count(market_id) from tx_order.tx_order group by market_id order by null ;
Copy the code

The optimizer may still choose to use sorting for grouping operations. ORDER BY NULL disallows sorting of results other than previous sorting BY grouping operations to determine the results.

Pay attention to

GROUP BY is implicitly sorted BY default (that is, GROUP BY in the absence of column ASC or column DESC indicator). However, it is not recommended to rely on implicit GROUP BY sorting (that is, sorting without ASC or DESC indicators) or explicit GROUP BY sorting (that is, BY using explicit ASC or DESC indicators on columns). To generate the given sort ORDER, it is necessary to supply an ORDER BY clause.

Use filesort for sorting

When index sort is not available, MySQL uses Filesort to scan tables to sort result sets, and the corresponding filesort generates additional sorting phases throughout the query process.

To support filesort, the optimizer implementation allocates a certain amount of memory sort_BUFFer_SIZE, which is session-exclusive and can be changed.

If the Filesort dataset is too large to sort in memory, the optimizer uses a disk as a temporary file for sorting. Some queries are particularly well suited to the memory sort implementation of filesort. For example, the optimizer can effectively utilize memory sort without the need for temporary file implementation. For example,

desc select * from tx_order.tx_order order by market_name desc limit 10;

1	SIMPLE	tx_order		ALL					1671956	100	Using filesort

Copy the code

Using temporary examples

desc select market_name from tx_order.tx_order order by RAND() desc limit 10;

1	SIMPLE	tx_order		ALL					1671956	100	Using temporary; Using filesort

Copy the code

Affects order by optimization

For slow queries in filesort, modify the max_LENGTH_FOR_SORt_data scalar, and lower the max_LENGTH_FOR_sort_data value to control the trigger point of the filesort selection algorithm. If max_LENGTH_FOR_sort_data is increased and disk usage increases, CPU usage decreases. For more information, see Mysql Sorting Optimization and Index Usage.

To speed up ORDER BY, check if you can have MySQL use indexes instead of an extra sort phase. If this is not possible, try the following strategies:

  • Increase the sort_buffer_SIZE variable value. Ideally, the value should be large enough to fit the entire result set into the sort buffer (to avoid writes to disk and merge passes), but at a minimum the value must be large enough to hold 15 tuples. (Up to 15 temporary disk files can be merged, and at least one tuple of each file must have space in memory.)

    Consider that the size of the column values stored in the sort buffer is affected by the value of the max_sort_length system variable. For example, if the tuple stores the value of a long string column and you increase the value max_sort_length, the size of the sort buffer tuple also increases, and you may need to increase sort_buffer_size. For column values computed as string expressions (such as those that call string-valued functions), the Filesort algorithm cannot resolve the maximum length of expression values, so max_sort_length must be allocated the number of bytes per tuple.

    To monitor the number of merge passes (merge temporary files), check the Sort_merge_passes status variable.

  • Increment the read_rnd_BUFFer_SIZE variable to read more rows at once.

  • Change the tmpdir system variable to point to a dedicated file system with a large amount of free space. Variable values can list several paths used in a circular fashion; You can use this feature to spread the load across multiple directories. : On Unix, the colon character () separates paths,; On Windows, separate paths with the semicolon character (). Paths should be named directories in file systems on different physical disks, not different partitions on the same disk.

View SQL parsing through execution plans

Using EXPLAIN (see 8.8.1 Optimizing Queries with EXPLAIN), you can check whether MySQL can parse the ORDER BY clause using indexes.

  • If the output Extra column EXPLAIN does not contain Using filesort, the index is used and filesort is not executed.
  • If the output Extra column EXPLAIN contains Using filesort, the index is not used and filesort is executed.

In addition, the optimizer trace can output filesort_summary information faster when filesort is executed. Such as:

"filesort_summary": {
  "rows": 100,
  "examined_rows": 100,
  "number_of_tmp_files": 0."sort_buffer_size": 25192,
  "sort_mode": "<sort_key, packed_additional_fields>"
}
Copy the code

For details about MySQL trace, see Chapter 8 Tracing the Optimizer.

conclusion

To write a reliable and efficient sort query, you need to understand the implementation of order BY. Here’s How MySQL executes Order BY.

When writing SQL statements and using order BY, the first consideration is to satisfy the index condition, if not, then meet the in-memory filesort, the worst case is temporary files, of course, this situation is the last we want to see.

Meanwhile, here is my personal experience:

  1. Joint Indexes are a good thing and can be applied to many usage scenarios in a project, as detailed Optimization can be seen in 8.3 Optimization and Indexes.
  2. SQL rewrite, complex single SQL can be rewritten into two or three, using the index.
  3. Set up a good table structure and assign fields the types and lengths that best fit them.

Open process to ponder SQL, more to see the execution plan, effectively avoid slow query, improve the performance of the service.

reference

  • How MySQL executes ORDER BY
  • Mysql > Select * from Mysql;