I received a sentry alarm this week, the following SQL query timed out.

select * from order_info where uid = 5837661 order by id asc limit 1
Copy the code

Run the show create table order_info command to find that the table is actually indexed

CREATE TABLE `order_info` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `uid` int(11) unsigned, `order_status` tinyint(3) DEFAULT NULL, ... PRIMARY KEY (' id '), KEY 'idx_uid_stat' (' uid ', 'order_status'),) ENGINE=InnoDB DEFAULT CHARSET=utf8Copy the code

In theory, executing the above SQL will hit the index idx_uID_stat, but in practice it will execute explain

explain select * from order_info where uid = 5837661 order by id asc limit 1
Copy the code

You can see that possible_keys (the possible index to this SQL) is idx_uid_stat, but it actually uses a full table scan

We know that MySQL uses cost to select whether to perform a final execution plan based on a full table scan or an index. So it looks like the full table scan cost less than the idx_uID_stat index. If only one of the statements (uid = 5837661) is selected, only one record will be returned to the table. This cost is negligible.

To see why the MySQL optimizer chose full table scan, I turned on Optimizer_Trace to find out

Voiceover: In MySQL 5.6 and later, you can use the Optimizer trace feature to see how the optimizer generates an execution plan

The process for using Optimizer_Trace is as follows

SET optimizer_trace="enabled=on"; Optimizer_trace SELECT * FROM order_info WHERE uid = 5837661 ORDER by id ASC LIMIT 1 SELECT * FROM optimizer_trace information_schema.OPTIMIZER_TRACE; SET optimizer_trace="enabled=off"; / / close optimizer_traceCopy the code

The MySQL optimizer will first calculate the cost of a full table scan, then select all indexes that the SQL may involve and calculate the cost of the index, and then select the one with the least cost to execute

{ "rows_estimation": [ { "table": "`rebate_order_info`", "range_analysis": { "table_scan": { "rows": 21155996, "cost": "Analyzing_range_alternatives ": {"range_scan_alternatives": [{"index": "idx_uid_stat", "ranges": [ "5837661 <= uid <= 5837661" ], "index_dives_for_eq_ranges": true, "rowid_ordered": False, "usING_MRR ": false," index_ONLY ": false, "rows": 255918, "cost": 307103, // Index cost chosen using idx_uid_stat: True}], "chosen_range_access_summary": {"range_access_plan": {"type": "range_scan", "index": "Idx_uid_stat ", // You can see that the index idx_uid_stat is finally selected to execute "rows": 255918, "ranges": [ "58376617 <= uid <= 58376617" ] }, "rows_for_plan": 255918, "cost_for_plan": 307103, "chosen": true } } ...Copy the code

You can see that the cost of a full table scan is 4.45e6, while the cost of selecting index IDx_uID_stat is 307103, which is much smaller than the cost of a full table scan, and from the final selection result (chosen_range_access_summary), SQL > select idx_uid_stat (PRIMARY, full table scan);

Taking a closer look at the execution plan, I notice that one of the options on the execution plan has caught my attention

{ "reconsidering_access_paths_for_index_ordering": { "clause": "ORDER BY", "index_order_summary": { "table": "`rebate_order_info`", "index_provides_order": true, "order_direction": "asc", "index": "Plan_changed ": true, "access_type": "index_scan"}}}Copy the code

This selection indicates that the index selection optimization is performed again for sorting reasons. Since our SQL uses id sort (order by ID ASC limit 1), the optimizer finally selects PRIMARY (full table scan). This option ignores the previous option based on the index cost. The main reasons for this option are as follows:

The short explanation is that The optimizer thinks — or should I say hopes — that scanning The whole table (which is already sorted by the id field) will find the limited rows quick enough, and that this will avoid a sort operation. So by trying to avoid a sort, the optimizer ends-up losing time scanning the table.

In order to avoid sorting, the optimizer considers sorting to be an expensive operation. In order to avoid sorting, the optimizer thinks that n of limit n can be performed quickly even with full table scan if it is small. This avoids sorting of IDS by using a full table scan (a full table scan is a scan of clustered indexes based on the ID primary key, which is itself sorted by ID)

That would be fine if it was the right choice, but the optimization is actually buggy! The actual idx_uID_stat execution is much faster (only 28 ms)! SQL > order by ID asC limit n; if n is small, it is very likely that the whole table will be scanned; if n is large, it will select the correct index.

This bug dates back to 2014, and many people have called on the official authorities to fix this bug in time. It may be difficult to implement, but MySQL 5.7 and 8.0 have not been solved yet, so we should try to avoid this writing method before the official fix. So how to avoid it

  1. Use force index to force the specified index as follows:
select * from order_info force index(idx_uid_stat) where uid = 5837661 order by id asc limit 1
Copy the code

This is ok, but not elegant. What if the index is deprecated? Hence the second, more elegant solution

  1. Use the order by (id+0) scheme as follows
select * from order_info where uid = 5837661 order by (id+0) asc limit 1
Copy the code

This scheme also allows the optimizer to select the correct index, which is recommended!

Shoulders of giants

  • Mysql optimizer bug 4zsw5.cn/L1zEi

Finally, seeking attention, the original is not easy, need some positive feedback, welcome everyone to pay attention to my public number “code sea”, together to advance, together with niubi