Sorting the results can be expensive, so you can optimize query performance by avoiding sorting or having fewer rows of data participating in the sorting.

When MySQL cannot use indexes to produce ordered results, it must sort rows of data. This may be done in memory or on disk, but MySQL always calls the process filesort, even though it doesn’t actually use a file.

If the values used for sorting can be put into the sort cache at one time, MySQL can sort in memory using the quicksort algorithm. If MySQL cannot sort in memory, it will sort block by block on disk. It uses a quicksort algorithm for each block and then merges the sorted blocks into the result.

There are two filesort algorithms:

  • Two passes: Reads the row pointer to the ORDER BY column, sorts it, passes the sorted list and re-reads the desired rows for output. Because rows are read twice from the table, the double traversal algorithm can be very expensive. And the second read results in a lot of random I/O accesses, which can have a particular impact on the performance of the MyISAM engine (which uses system calls to fetch every row since MyISAM relies on the operating system cache to store data). On the other hand, the least amount of data is stored during sorting, so if the rows to be sorted are all in memory, it will be cheaper to store fewer rows and read the final result twice.
  • Single traversal: reads all columns to be queried, sorts them according to the columns specified BY ORDER BY, then traverses the sorted list and outputs the specified data columns. This algorithm is only supported in MySQL 4.1 and later. Because it avoids secondary reads of data table rows and converts more random I/O to sequential I/O, this approach provides better performance, especially for queries on data sets with large I/O spans. However, this means that it can take up more memory space, because it holds the columns that need to be read for each row, not just the columns that need to be sorted. This means less utilization for the sort buffer, and file sorting needs to handle more sort merges.

It’s hard to say which algorithm works better, there are best and worst cases for every algorithm. MySQL uses the single-pass algorithm when all columns in the table plus the size of the column used for sorting do not exceed max_LENGTH_FOR_sort_DATA. You can modify this parameter to influence the selection of the sorting algorithm.

Note that MySQL’s Filesort uses more temporary storage than you might expect, because it allocates a fixed amount of storage for each sorted element. These storage Spaces should be large enough to hold the largest elements, and fields such as VARCHAR use the corresponding maximum length. Also, if you are using the UTF-8 character set, MuSQL allocates 3 bytes per character. As a result, we find that poorly optimized queries can result in temporary storage on disk that is several times larger than the storage space of the data table itself.

When sorting federated queries, MySQL may perform file sorting twice during query execution. If the ORDER BY clause only refers to the first table of the union query, MySQL can file sort the table before processing the union query. If this is the case, “Using Filesort” will be displayed in the Extra field in EXPLAIN. For other sorts — for example, if the ORDER BY column is not the first table, or if the ORDER BY column is used for more than one table — MySQL must use temporary tables to cache the query results, and then file sorts the temporary tables after the joint query is complete. In this case, EXPLAIN shows “Using temorary; Using filesort “. If the LIMIT constraint is included, it occurs after the file sort, so the storage space for temporary tables and file sorts can be very large.

MySQL 5.6 introduces a major improvement when you only need to sort a subset of data rows, such as LIMIT. Instead of sorting the entire result set and returning a portion of the data, MySQL sometimes throws away rows that are not needed to improve sorting efficiency. However, sorting also needs to be used with care and is likely to lead to a surge in storage usage that will eventually overload the system.