3 tips to optimize slow paging queries under large amounts of data!

By Youyo I

Source: https://www.cnblogs.com/youyoui/p/7851007.html

The copyright of this article belongs to the author

When a table that needs to be queried from a database has tens of thousands of records, it can be slow to query all the results at once, especially as the data volume increases, and paging queries are required.

For database paging queries, there are also many methods and optimization points. Here are some of the ways I know.

The preparatory work

To test some of the optimizations listed below, one of the existing tables is illustrated below.

The table name: order_history
Description: Order history table for a business
Unsigned int ID, tinyint(4) int type
The table contains 37 fields and does not contain large data such as text. The maximum value is VARCHAR (500). The ID field is the index and increases.
Data amount: 5709294
It is not easy to find a test table with millions of levels offline. If you need to test yourself, you can write shell scripts and insert data to test. The following SQL statement execution environment does not change, the following are the basic test results:

select count(*) from orders_history;Copy the code

Result: 5709294

The three query times are as follows:

8903 ms
8323 ms
8401 ms

General paging query

General paging queries can be implemented using a simple limit clause. The limit clause declares as follows:

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offsetCopy the code

The LIMIT clause can be used to specify the number of records returned by the SELECT statement. Note the following points:

The first argument specifies the offset of the first return row, starting at 0
The second parameter specifies the maximum number of rows to return
If only one argument is given: it indicates the maximum number of rows to return
The second argument -1 retrieves all rows from an offset to the end of the recordset
Initial row offset is 0(not 1)

Here is an example application:

select * from orders_history where type= 8limit1000, 10;Copy the code

This statement will query the 10 entries in the table orderS_history after offset:1000, i.e. 1001-1010 (1001<= ID <=1010).

By default, records in a table are sorted by primary key (usually ID). The result above is equivalent to:

select * from orders_history where type=8 order by id limit10000, 10;Copy the code

The three query times are as follows:

3040 ms
3063 ms
3018 ms

For this type of query, the following tests the impact of query record volume on time:

select * from orders_history where type= 8limit10000, 1; select * from orders_historywhere type= 8limit10000, 10; select * from orders_historywhere type= 8limit10000100; select * from orders_historywhere type= 8limit10000100; select * from orders_historywhere type= 8limit 10000,10000;Copy the code

The three query times are as follows:

Query one record 3072ms 3092ms 3002ms
Query 10 records 3081ms 3077ms 3032ms
Query 100 records: 3118ms 3200ms 3128ms
Query 1000 records: 3412ms 3468ms 3394ms
Query 10000 records: 3749ms 3802ms 3696ms

In addition, I also made more than ten queries. From the query time, it can be basically determined that when the query record quantity is less than 100, there is no difference in the query time. With the query record quantity increasing, the time will be more and more.

Tests against query offsets:

select * from orders_history where type= 8limit100100; select * from orders_historywhere type= 8limit1000100; select * from orders_historywhere type= 8limit10000100; select * from orders_historywhere type= 8limit100000100; select * from orders_historywhere type= 8limit1000000100;Copy the code

The three query times are as follows:

Query 100 offset: 25ms 24ms 24ms
Query 1000 offset: 78ms 76ms 77ms
Query the offset 10000:3092ms 3212ms 3128ms
Query 100000 offset 3878ms 3812ms 3798ms
Query 1000000 offset: 14608ms 14062ms 14700ms

With the increase of the query offset, especially when the query offset is greater than 100,000, the query time increases sharply.

This paged query scans from the first record in the database, so the later the query becomes slower, and the more data you query, the slower the overall query speed.

Use subquery optimization

In this way, the ID of the offset position is located first and then queried later. This method is suitable for the case of increasing ids.

select * from orders_history where type= 8limit100000, 1; select id from orders_historywhere type= 8limit100000, 1; select * from orders_historywhere type=8 and id>=(select id from orders_history where type= 8limit100000, 1)limit 100;
select * from orders_history where type= 8limit100000100;Copy the code

The query time of the four statements is as follows:

Statement 1:3674ms
Statement 2:1315ms
Statement 3:1327ms
Statement 4:3710ms

Note for the above query:

Compare statement 1 with statement 2: using select ID instead of SELECT * is 3 times faster
Compare statement 2 with statement 3: the speed difference is tens of milliseconds
Compare statement 3 with statement 4: statement 3 is 3 times faster thanks to the select ID speed increase

This method will be several times faster than the original general query method.

Use ID to qualify optimization

This method assumes that the id of the table is continuously increasing, then we can calculate the id range of the query according to the number of pages and records queried. We can use id between and to query:

select * from orders_history where type=2 and id between 1000000 and 1000100 limit 100;Copy the code

Query time: 15ms 12ms 9ms

This query method can greatly optimize the query speed, basically can be completed in dozens of milliseconds. The limitation is that it can only be used when the ID is clearly known, but the basic ID field is usually added when the table is created, which makes paging queries a lot easier.

There’s another way to write it:

select * from orders_history where id >= 1000001 limit 100;Copy the code

Select * from table_name where id = 0; select * from table_name where ID = 0; select * from table_name where ID = 0;

select * from orders_history where id in(select order_id from trade_2 where goods = 'pen') limit 100;Copy the code

Note this approach to in queries: some versions of mysql do not support limit in clauses.

Use temporary table optimization

This approach is no longer query optimization, which is mentioned here incidentally.

For problems with ID-based optimization, sequential increments of ids are required, but in some scenarios, such as when using history tables, or when missing data is a problem, consider using temporary tables to record paging ids and paging ids for IN queries.

This can greatly improve the speed of traditional paging queries, especially when the data volume is in the tens of millions.

Id description of the data table

In general, when creating a table in a database, it is mandatory to add an ID increment field to each table for easy query.

If the amount of data such as order library is very large, it is generally divided into database and table. Instead of using the database ID as the unique id, you should use a distributed, highly concurrent unique ID generator and store the unique id in additional fields in the data table.

Using range queries to locate ids (or indexes) and then indexes to locate data can speed up queries several times. Select id, select *;

End

Long press the qr code below to pay immediate attention to [Tanuki technology Nest]

Top technical experts from Alibaba, JD.com, Meituan and Bytedance are in charge

Create a “temperature” technology nest for IT people!

3 tips to optimize slow paging queries under large amounts of data!

The preparatory work

General paging query

Use subquery optimization

Use ID to qualify optimization

Use temporary table optimization

Id description of the data table

Related Posts

Concurrent Java programming | thread pool, rounding

Install and use Jenkins in K8S

SSRF Server request forgery (reprint)