preface

This article includes Mysql Where query execution process, range query to stop joint index matching, back table operation analysis, common index failure scenarios, Extra analysis and other knowledge.

background

A full query appears in a table with 60 million data, and the SQL statement is repeated to find out the cause of index failure.

# sqlStatements EXPLAINSELECT count(*) FROM order_recipient_extend_tab WHERE start_date>'1628442000' and start_date<'1631120399' and station_id='1809' and status='2';
Copy the code

Recipient_recipient_extend_tab = start_date, station_id, and status; recipient_recipient_extend_tab = start_date, station_id, and status; recipient_recipient_extend_tab = recipient_recipient_extend_tab;

Joint index Field 1 Field 2 Field 3
idx_date_station_driver start_date station_id driver_id

Where Conditional query execution process

Understanding how Mysql performs WHERE conditional queries can help you quickly and clearly see why indexes fail. The index with a high matching degree in this slow query is IDx_date_station_driver. Analyze the execution process of the WHERE condition query in this slow query.

Mysql can mainly summarize the where criteria extraction rules into three categories: Index Key (First Key & Last Key), Index Filter, and Table Filter.

Index Key

Index Key Is used to determine the range of the SQL query in the Index tree. The Index First Key is used to locate the start range of an Index query, and the Index Last Key is used to locate the end range of an Index query.

  • Index First Key

    Extraction rule: Starting from the First field of the Index, check whether the field exists in the WHERE condition. If so, add the corresponding condition to the Index First Key, and continue to read the next field of the Index. If the Index First Key exists and the condition is >, the corresponding condition is added to the Index First Key, and the Index First Key extraction is terminated. If the Index First Key does not exist, the Index First Key extraction is terminated.

  • Index Last Key

    In contrast to the Index First Key, the extraction rule is as follows: Start from the First field of the Index, check whether it exists in the WHERE condition. If it exists and the condition is =, <=, add the corresponding condition to the Index Last Key, and continue to extract the next field of the Index. If it exists and the condition is <, add the condition to the Index Last Key and terminate the extraction. If the Index Last Key does not exist, the Index Last Key extraction is terminated.

According to the Index Key extraction rules, the Index Last Key extracted in this slow query is start_date>’1628442000′, and the Index Last Key is start_date<‘1631120399’.

The Index First Key is only used to locate the initial range of the Index, using the Index First Key condition, starting from the root node of the Index B+ tree, using the binary search method to quickly Index to the correct leaf node location. The Index First Key is checked only once during the Where query.

Index Last Key is used to locate the Index end range. Therefore, for each Index record read after the start range, check whether the Index record exceeds the Index Last Key range. If the Index record exceeds the Index Last Key range, the current query ends.

Index Filter

In the Index range specified by the Index Key, not all Index records meet the query criteria. For example, not all Index records in the Index Last Key and Index Last Key range satisfy station_id = ‘1809’. This is where the Index Filter comes in.

Index Filter, also known as Index push down, is used to Filter the records that do not meet the search conditions in the Index query range. Each record in the Index range must be compared with the Index Filter. If the Index Filter does not meet the requirement, the system discards the record and continues to read the next record in the Index.

The extraction rule of Index Filter is as follows: Start from the first field of the Index and check whether it exists in the WHERE condition. If it exists and the condition is only =, skip the first field and continue to check the next field of the Index. The next Index column adopts the same extraction rule (explanation: the fields whose condition is = have been filtered out of the Index Key). If the Index Filter exists and the conditions are >=, >, <, or <=, the Index Filter skips the current Index field and adds all index-related fields in other WHERE conditions to the Index Filter.

Based on the Index Filter extraction rule, the Index Filter extracted in this slow query is station_id=’1809′. In the range of Index query determined by Index Key, station_id=’1809′ should be compared when traversing Index records. If this condition is not met, it will be lost directly and continue to read the next Index record.

Table Filter

Table Filter Is used to Filter out the data that cannot be filtered by the index. After querying the entire row in the secondary index through the primary key back Table, the system checks whether the record meets the Table Filter condition. If the record does not meet the Table Filter condition, the system loses the record and continues to judge the next record.

The extraction rule is simple: All query conditions that do not belong to the index field are classified as Table Filter. According to the Table Filter extraction rule, the Table Filter is status= ‘2’.

Summary and supplement

Index Key Is used to determine the range of Index scanning. Index Filter Is used to Filter indexes. Table Filter You need to return the Table and Filter it on the Mysql server.

Index Key and Index Filter occur in InnoDB storage layer, Table Filter occurs in Mysql Server layer.

Before MySQL5.6, Index Filter and Table Filter are not distinguished, all Index records in the range of Index First Key and Index Last Key are read back to the Table, and then returned to MySQL Server layer for filtering.

In MySQL 5.6 and later, Index Filter is separated from Table Filter, and Index Filter is filtered by InnoDB storage engine layer, which reduces the record interaction overhead of returning to Table and returning to MySQL Server layer, and improves the execution efficiency of SQL.

Analyze the cause of index failure

The first is count(), where the wildcard * is optimized not to expand all columns, but to actually count rows by ignoring all columns. So if you want to collect only the number of rows, it’s best to use count().

Next, analyze the WHERE statement. Select idx_date_station_driver, start_date>’1628442000′, idx_date_station_driver, idx_date_station_driver, idx_date_station_driver, idx_date_station_driver, idx_date_station_driver, idx_date_station_driver, IDx_date_station_driver, IDx_date_station_driver, IDx_date_station_driver, IDx_date_station_driver, IDx_date_station_driver Start_date <‘1631120399′, Index Filter is station_id=’1809’, Table Filter is status= ‘2’.

The Index matching process is to extract the Index First Key and locate the initial Index range on the Index B+ tree. The binary search method is used to quickly locate the initial leaf nodes that meet the query conditions on the Index B+ tree. Through the above Where condition query execution process, Start_date >’1628442000′ and start_date<‘1631120399′ and status=’2′ and station_id=’1809’ Idx_date_station_driver (start_date, station_id, driver_id) matches only the first field of idx_date_station_driver(start_date, driver_id). Station_id =’1809 ‘The exact query does not match the Index, but rather plays a role in Index Filter push-down. In fact, this is because the range query stops the union index from matching.

A range query causes the federated index to stop matching

Why does a range query stop a federated index from matching? This involves the principle of left-most prefix matching. Suppose a joint index index(a, b) is created, sorting A first and sorting B if A is equal, as shown in the figure below. In the index tree, A is globally ordered, while B is globally unordered and locally ordered. Globally, the value of b is 1, 2, 1, 4, 1, and 2. Only b=2 cannot use the index directly. Locally, when a is determined, b is ordered, and a= 2&&b =4 can use the index. Therefore, the fundamental reason why the range query stops the matching of the joint index is that the ordered state of the non-first field in the index tree depends on the equality of the previous field, while the range query destroys the local ordered state of the next index field, leading to the index stop matching.

Range queries stop union indexes from matching and do not filter out data whose STATION_ID does not equal ‘1809’ when the index matches. The Index First Key and Index Last Key scan ranges are determined by the start_timestamp_of_date time. Start_timestamp_of_date range queries filter 73% of the data volume, while station_id=’1809′ accurate queries filter 99% of the data volume.

Query conditions The amount of data Accounted for
All the data 63.67 million 100%
start_timestamp_of_date>’1628442000′ and start_timestamp_of_date<‘1631120399’ 17.42 million 27.35%
station_id=’1809′ 80000 0.16%

Cost of back table operations

Because the status field is not in the idx_date_station_driver field, you need to query the data filtered by the index back to the table and check whether the data meets the query conditions at the Mysql service layer.

The Mysql optimizer will estimate the cost of a high-matching index when executing a SQL statement. If the cost of a high-matching index is greater than the cost of a full table lookup, Mysql will select a full table scan. This conclusion may counter common sense, we are under the impression that indexes are used to improve query efficiency. There are two main factors involved:

  1. If the query condition or the field to be searched is not in the field of the secondary index, the system performs the following operations: secondary index + primary key index.
  2. The performance of random DISK I/ OS is lower than that of sequential DISK I/ OS. A back-table query is random I/O on a primary key index, and a full table scan is sequential I/O on a primary key index.

Do experiments to see if the overhead of back-table operations is the direct cause of index failure?

Remove the status=’0′ query condition and explain whether the query uses the index IDx_date_station_driver. The result is shown below, with less overhead of back table operations and no index invalidation.

conclusion

Combined with the above analysis, the reasons for index failure are summarized as follows: the range query stops the matching of the joint index, and the data filtered by index matching is not enough. As a result, the Mysql optimizer estimates that the cost of Table Filter operation is higher than that of the full Table query, so the full Table query is selected. The index failure is caused by a range query that causes the union index to stop matching, while the overhead of back table operations is the direct cause of index failure.

Optimize the index

The culprit for this slow query index failure is that the range query stops matching the federated index, simply by adjusting the field of the range query after the field of the precise query, that is

Change idx_date_station_driver(start_date, station_id, driver_id) to idx_station_date_driver(station_id, start_date, driver_id) Driver_id). The optimized results are shown in the figure below.

expand

Common scenarios of index failure

  1. The leftmost prefix matching rule is violated. For example, index(a,b) exists, but the query condition is only B.
  2. Perform any operation on an index column, including calculations, functions, type conversions, and so on.
  3. A range query stops the union index from matching.
  4. Reduce the use of select*. Avoid unnecessary back table operation overhead and use overwrite indexes whenever possible.
  5. Use does not equal (! =, <>), and use the or operation.
  6. The index of a string without single quotation marks is invalid.
  7. Like begins with a wildcard ‘% ABC ‘. Note that like ‘ABC %’ can be indexed.
  8. Order by violates the leftmost matching principle, including non-indexed field sorting, resulting in file sorting.
  9. Group by violates the leftmost matching rule and contains groups of non-index fields, resulting in temporary tables.

The Explain analysis

Slow query analysis is inseparable from mysql’s Explain statement, which focuses on two fields Type and Extra.

Type indicates the method of accessing data, and Extra indicates the method of filtering and collating data. Listed here for easy search.

Type Extra
ALL A full table scan Using index With overwrite indexes, there is no need for back tables and no need for Mysql service layer filtering
index Index tree full scan Using where Get the data from the storage engine layer and filter the data in the Mysql service layer with where query criteria.
range Index tree range scan Using where; Using index Index range scan. An index scan is similar to a full table scan, but at a different level.
ref Nonunique index scans, such as nonunique indexes and nonunique prefixes for unique indexes Using index condition Use index push-down to make full use of query index fields to filter data at the storage engine layer
eq_ref Unique index scanning, such as unique index, primary key index Using temporary Temporary tables store results for sorting and grouping queries
const Convert the query to a constant Using filesort File sort, for sorting
NULL No tables or indexes are accessed NULL Back to the table

The resources

  1. SQL where conditions, in the database extraction and application analysis
  2. MySQL index invalid