Problem description

I came to the company early today and opened ali Cloud DMS console according to the usual practice. There was a red warning of slow SQL in SQL analysis, as shown below:

So I quickly click to see the specific slow SQL record, there are parts of the SQL execution more times, as shown below (hidden real table name) :

SELECT
    a.* 
FROM
    `some_table` a 
WHERE
    a.estimate_deliver_date > = '2021-04-01 00:00:00. 0 
    AND a.estimate_deliver_date < 'the 2021-04-09 23:59:59. 0 
ORDER BY
    a.id DESC 
    LIMIT 0.10;
Copy the code

The statement looks simple and unimpressive, but the query takes about 6s to execute and the table has about 500 W rows.

This is the general situation, the following into the investigation of the link.

Troubleshoot problems

Firstly, I suspect that the estimate_deliver_date field is not indexed, which leads to the query of full table scan of All type. Then, I look at the index information of the following table:

 PRIMARY KEY (`id`)
 KEY `idx_statement_date` (`statement_date`) USING BTREE
 KEY `idx_estimate_deliver_date` (`estimate_deliver_date`) USING BTREE
 ......
Copy the code

It can be seen that the index idx_ESTIMate_deliver_date contains the estimate_deliver_date field. Our query condition is estimate_deliver_date, which should follow the index.

There is no idx_estimate_deliver_date index, through explain to see. Mysql provides explain to analyze the execution plan of SQL statements. We focus on the following key metrics in the execution plan:

  • Select_type: specifies the query type, including sample and delived
  • Type: indicates the scan type, including ALL, index, range, and ref
  • Possible_keys: Possible index
  • Key: The final index selected by the mysql optimizer
  • Key_len: total length of fields used in the index selected by the mysql optimizer
  • Rows: The number of rows that will be scanned by this query. This is an estimate
  • Extra: Additional conditions, such as using WHERE, using index, using filesort, etc

We use Explain to analyze the statement and get the following execution plan:

The value of type is index, indicating that full index scan is not performed, but the efficiency of full index scan is also low. Possible_keys displays IDx_ESTIMATE_deliver_date, but key displays PRIMARY, indicating the PRIMARY key index that has been queried.

Since the idx_ESTIMate_deliver_date INDEX is not walked, we can specify the query to FORCE the INDEX by FORCE INDEX(idx_ESTIMate_deliver_date). Now the query statement looks like this:

SELECT
    a.* 
FROM
    `some_table` a 
    FORCE INDEX(idx_estimate_deliver_date)
WHERE
    a.estimate_deliver_date > = '2021-04-01 00:00:00. 0 
    AND a.estimate_deliver_date < 'the 2021-04-09 23:59:59. 0 
ORDER BY
    a.id DESC 
    LIMIT 0.10;
Copy the code

The statement is executed very quickly, in milliseconds, as shown below:

Let’s see what the result of the statement’s execution plan is:

As can be seen from the above figure, after adding FORCE INDEX(IDx_ESTIMate_deliver_date), the query does move idx_ESTIMate_deliver_date INDEX, and the execution speed is greatly improved.

Problems found, summarizes down is probably: MySQL optimizer think in the case of the query, the primary key index to find data quickly, and if go joint index need to scan the index after sorting, additional sorting overhead, and the primary key index natural order, so the optimizer considers, the primary key index.

This section describes the principle of mysql index selection

Mysql optimizer index selection criteria

First of all, it is the MySQL optimizer’s job to select indexes.

The purpose of the optimizer’s index selection is to find an optimal execution plan and to execute the statement with the minimum cost. In a database, the number of rows scanned is one of the factors affecting the execution cost. Fewer rows scanned means less disk data is accessed and less CPU resources are consumed.

Of course, the number of rows scanned is not the only criterion. The optimizer also takes into account factors such as whether to use temporary tables, whether to sort, and so on.

Let’s compare the results of the previous two explain executions:

Rows is 227, while union rows is 357, and Extra information shows that additional sorting is required Using filesort. So without enforcing the index, the optimizer chose the primary key index because it felt that the primary key index scanned fewer rows and did not require additional sorting operations, since the primary key index was naturally ordered.

How is the estimated number of rows estimated

There is no specific introduction here, there is an algorithm to estimate, specific can query the corresponding data to understand.

Does the index take into account the Order BY field

If the index does not take the order by field into account, then the mysql server layer may have to sort the query results from the engine layer, which will cause additional performance overhead. This can be seen using Filesort in extra, which executes the plan. Specific please refer to the optimization of the order by the official documentation dev.mysql.com/doc/refman/…

Can changing limit fix the problem

Here we can do an experiment. If we increase the limit to 20, this statement will become:

SELECT
    a.* 
FROM
    `some_table` a 
    FORCE INDEX(idx_estimate_deliver_date)
WHERE
    a.estimate_deliver_date > = '2021-04-01 00:00:00. 0 
    AND a.estimate_deliver_date < 'the 2021-04-09 23:59:59. 0 
ORDER BY
    a.id DESC 
    LIMIT 0.20;
Copy the code

The explain execution plan for this modified statement is:

As you can see from the execution plan, the query takes the idx_ESTIMATE_deliver_date index. As for why, I guess that the number of rows expected to go through the primary key index exceeds the number of rows expected to go through idx_ESTIMate_deliver_date index, and combined with other conditions, I would rather go through the extra sort than the idx_ESTIMate_deliver_date index.

The solution

Mysql > select index (); Mysql > select index (); Mysql > select index ();

  • Forced walk index
  • The interference optimizer selects the index

Force index: force index

As written earlier, we use force Index directly to force the query to remove the index, so that the optimizer does not choose the wrong index to cause query efficiency problems.

SELECT
    a.* 
FROM
    `some_table` a 
    FORCE INDEX(idx_estimate_deliver_date)
WHERE
    a.estimate_deliver_date > = '2021-04-01 00:00:00. 0 
    AND a.estimate_deliver_date < 'the 2021-04-09 23:59:59. 0 
ORDER BY
    a.id DESC 
    LIMIT 0.10;
Copy the code

This is the quickest way to solve the problem. Force Index solves the problem of the optimizer choosing the wrong index, but it also has disadvantages: for example, it is hard coded into the code, high coupling, difficult to maintain, in case the index name changes later, the code needs to adjust accordingly. This is not recommended.

Interference optimizer selects index: increase limit

This allows the optimizer to select the target index by increasing the number of rows scanned:

SELECT
    a.* 
FROM
    `some_table` a 
WHERE
    a.estimate_deliver_date > = '2021-04-01 00:00:00. 0 
    AND a.estimate_deliver_date < 'the 2021-04-09 23:59:59. 0 
ORDER BY
    a.id DESC 
    LIMIT 0.100;
Copy the code

Although this way can go to the index, but always feel something wrong, and will not necessarily go to the target index, even if go to the target index, finally have to filter out the data in the business code, is not a waste of computing resources?

Interference optimizer selects indexes: use subqueries

Of course, in addition to the above several methods, there is another method is to use a subquery, subquery SQL as follows:

SELECT a.* FROM `some_table` a WHERE a.id in ( SELECT b.id FROM ( SELECT c.id FROM `some_table` c WHERE C. esstimate_deliver_date >= '2021-04-01 00:00:00.0' AND C. esstimate_deliver_date < '2021-04-09 23:59:59.0' ORDER BY C.id DESC LIMIT 0, 10 ) b) ORDER BY a.id DESC;Copy the code

However, the use of subqueries and order by may lead to the generation of temporary tables and filesort, which requires adequate debugging or introduces additional performance issues.

Other solutions

SQL optimization is a big and tedious project, we have a lot of ways to solve the slow query problem, here is not a spread out.