preface

Mysql as the current relatively popular relational database, in the form of various evolution exists in the it company, on the basis of B + tree storage structure can bring high query efficiency, but also with the single table storage increasing number of lines and increase the storage space, subject to the B + tree layer and the cached page space, query efficiency will drop sharply. Therefore, data archiving is a very normal technical requirement. Archiving generally requires full table scanning to filter the data. Today, the scenario I want to record is some SQL query primary key out of order problem during data archiving.

The body of the

background

Here is our goods_info table structure (with a secondary index of goods_id created). We are going to archive this table with a full table scan. Select id from goods_info where ID > {index} limit {num}; select id from goods_info where id > {index} limit {num};

column type
id bigint
store_id bigint
goods_id bigint
goods_name varchar
create_time timestamp
update_time timestamp

Then, a problem was found when using the test environment. Some data would often be missed when querying first and then filtering and archiving, resulting in some data could not be archived properly. At the beginning, I thought it was the configuration problem of archive conditions, so I checked in the wrong direction. Finally, I found that the ID list queried by SQL statement was not in order, which caused us to miss some data.

What we think is the first id query list:

id
1
3
5
8
14

Real first query ID list:

id
56
3
5
1
22

Not only is the result returned out of order, it also causes some data to be missed in our archiving rules.

Problem orientation

Select id from goods_info where id > -1 limit 100 and index = -1 to start with the first data (because all ids in our table are greater than 0). In my subjective understanding of Mysql, if there is a field filter attached to the WHERE condition, the query must be optimized with that field, or a federated index prefixed with that field, to avoid full table scans. SQL > select * from ‘where’ where ‘id’; select * from ‘where’ where ‘id’;

Sql > select * from ‘goods_id’ where ‘goods_id’ = ‘id’; select * from ‘goods_id’ where ‘id’ = ‘id’;

The analysis reason

Mysql > select * from secondary index goods_id->id; Select id from goods_info WHERE id > -1 LIMIT 100

  1. The first query data only id, secondary index can complete index coverage, meet the requirements;
  2. The minimum value in the entire table is greater than 0. All data in the table satisfy the condition. Where is no longer bindingSelective disregardGiven this where condition;
  3. The last limit of limit is to batch query specific data.

Mysql selects the secondary index goods_id-> ID during query optimization to traverse the first SQL data and return the corresponding results.

To solve the problem

Since we chose the wrong index, we will force the SQL to go to the primary key index.

  1. Order by ID (order by preferred index when optimizing queries)
  2. Force {index} (mandatory index, needless to say)
  3. 3. Select more columns so that secondary indexes can not take advantage of index coverage.

conclusion

Through this problem location, in the future use should try to change the stereotype of some Mysql query optimization, learn to analyze the underlying cause.