How to optimize limit performance for large table pagination queries

Completion of requirements, not only in the speed, but also to consider the quality. If you have time, you should think about how you can optimize your requirements in terms of performance, code readability, and scalability. This is how progress accumulates.

There was a slight error in the article I wrote yesterday. I will correct it here. Select count(1) from (original SQL) as total select count(1) from (original SQL) as total select count(1) from (original SQL) as total select count(1) from (original SQL) as total select count(1) from (original SQL) as total where… , the optimized SQL performance of the total number of queries is better.

But the limit problem is not solved by the MyBatis – Plus optimizer. If the data volume of the table is very large, we need to optimize the limit in addition to optimizing the SQL for the total number of queries.

I’ve seen many articles about id optimization to improve limit performance, and the example they give is a single table query. The reason why we can only look up single table is that the design of the table has already eliminated the multi-table join query through redundant fields, which is also a recommended and mainstream method to optimize the performance of large table queries.

The following is an example of using ID optimization to improve limit performance by adding redundant columns to the table design and removing multiple table link queries:

SELECTA. fieldFROM table a
RIGHT JOIN 
(
SELECT id  -- Look only for the ID column
FROM table
WHERE. (Conditional query for non-clustered indexes)LIMIT 1000000.20
) as b ON b.id = a.id
Copy the code

The original SQL is:

SELECTFields that need to be obtainedFROM table
WHERE. (Conditional query for non-clustered indexes)LIMIT 1000000.20
Copy the code

Since the id of the primary key is stored on the leaf node of the non-clustered index, if select only searches for the primary key, it is not necessary to obtain the record information from the cluster index based on the primary key ID. However, if SELECT needs to query the information of other fields except the primary key, it must obtain the record information from the cluster index.

The limit is how many records to skip based on the query results, that is, how many records to skip is also the information that needs to be queried for those records. For example, if limit 1000020 or 20 is used, it will take 10020 times to retrieve the information from the clustered index based on the primary key ID. Select ID can reduce the number of 10020 queries because the primary key ID is stored in the non-clustered index. The premise is that the condition after where must ensure that all indexes are walking, and any optimization is futile under a full table scan.

With this in mind, I optimized a paging query I did recently for a requirement. Although multi-table join query is required, some conditions of WHERE are filtered on the main table, or the condition is filtered only on the main table by optimization means, so this scheme is also suitable for the performance optimization by ID limit.

The transformation is to split the original SQL into two parts. The first part is to remove the SELECT option, leaving only the ID, and then remove the multi-table join. The first part is mainly to keep the query condition after where. Gets the primary key ID of the record currently paged based on a query condition. Complete paging in this section.

Complete paging query according to the filter conditions to obtain the id of the current page:

select `ID` fromThe main tablewhereQuery conditionsLIMIT 90.10 
Copy the code

Paging is done through myBatis – Plus’s paging plugin. Complete paging on the mapper method of querying ID to get total information. Below is the SQL for the total number of queries optimized by the Myatis – Plus paging plug-in.

-- SQL printed by the Myatis - Plus paging-plugin
SELECT COUNT(1) FROMThe main tableWHEREQuery conditionsCopy the code

The second part is to query the required field according to the obtained record ID of the current page. Including subqueries, join queries, and so on. ID in () = ID in () = ID in ();

selectField, subqueryfromMain table, join tablewhereThe main table ID.inObtained in the last queryid) 
Copy the code

With this optimization, paging performance can be improved.

There are also some articles that introduce multiple optimization by ID, such as:

. where id>=(page * pageSize) limit pageSize;Copy the code

In my opinion, this scheme is not practical. Even if the ids of the table records are continuous and no records are deleted, are the ids of the records filtered according to the condition also continuous? The paging query results of this scheme must be inaccurate.

How to optimize the performance cost of paging query limit when multiple tables are joined and the query condition needs to be filtered according to a certain field of the joined table?

select a.*,b.* from a left join b on a.b_id=b.id where a.xx=? and b.xx=?
Copy the code

This is really not a good table structure design, really can not think of a good optimization scheme.

I think one is because paging query on the following page, request query conditions cannot be changed, and also will not change under normal circumstances, if changed the page number should be starting from 1 again, you can use the memory cache the last time the biggest id query, based on user + the dimensions of the interface to the cache, when access to the next page, skip the last query the biggest id, And then it’s just a matter of how many records to take. However, the disadvantage is that it cannot jump page query, the front end can only be the next page, the next page, but also need to waste memory to cache the query state, in the case of query conditions change, but also need to clear the cache.

How did you address the problem of paging limits in your project? Please leave us a comment.

How to optimize limit performance for large table pagination queries

Related Posts

CSS of landing

A bit of thinking about external access business of brokerage system

Graviton: A minimalist open source editor