“This article has participated in the good article call order activity, click to see: back end, big front end double track submission, 20,000 yuan prize pool for you to challenge!”

preface

Pagination is a very common feature, as long as backend development needs to write pagination, so why paging?

  • From a business point of view, even if the system returns all the data, the user will not look at the data in the majority of cases.
  • Technically, this is because of the cost of fetching the data, the target server disk, memory, network bandwidth, and whether the request originator itself can handle the large volume of data.

MySQL Page syntax

select * from table limit 0, 20
Copy the code

Consider: With pagination, point 2 above, can these costs really be reduced?

Create tables, create data

CREATE TABLE t1 (ID BIGINT NOT NULL AUTO_INCREMENT COMMENT 'primary key ', M_ID BIGINT NOT NULL COMMENT' other id', 'name' VARCHAR (255) COMMENT 'identity_no VARCHAR (30) COMMENT' id ', 'address' VARCHAR (255) COMMENT 'address ', Create_time TIMESTAMP NOT NULL COMMENT 'add_time ', modify_time TIMESTAMP NOT NULL COMMENT' add_time ', PRIMARY KEY 'id' (' id ')) ENGINE INNODB DEFAULT CHARSET = 'UTF8' COMMENT 'Deep page test table '; INSERT INTO t1 VALUES (1, 1, '1 ', '100000000000000000', '1 ', '1 ') INSERT INTO t1 VALUES (1, 1, '1 ', '100000000000000000', '1 ', '1 ') '2010-01-01 00:00:00', '2010-01-01 00:00:00'); Set @ I =1; insert into t1(m_id, name, identity_no, address, create_time, modify_time) select @i:=@i+1 as m_id, Concat (' 100000000000000000+@i ', concat(' 100000000000000000+@i ',@i)); date_add(create_time,interval +@i*cast(rand()*100 as signed) SECOND), date_add(date_add(create_time,interval +@i*cast(rand()*100 as signed) SECOND), interval + cast(rand()*1000000 as signed) SECOND) from t1; # note: this method from the network, the method source: https://blog.csdn.net/mysqltop/article/details/105230327select count (1) the from t1;Copy the code

Total data 400W+ :

1: There are no query conditions and no sorting

select id,m_id, name, identity_no, address, create_time, modify_time 
from t1 limit 1000000, 20;
Copy the code

Data after 100W, time consuming:

0.613 s elapsed

Sort by primary key

select id,m_id, name, identity_no, address, create_time, modify_time 
from t1 order by id limit 1000000, 20;
Copy the code

Time: Reduced

0.417 s elapsed

Comparison of execution plans:

1: 2:

You can see that sorting with primary keys uses the primary key index and only reads the first n pieces of data that you need, so it’s fast.

So, conclusion 1: Add the Order by primary key even if the business doesn’t seem to have any conditions yet and doesn’t need to be sorted.

Here’s another question: what sort does MySQL default to if you don’t have a sort condition?

There is a difference between the physical order and the logical order. For example, after deleting the original data and inserting the data that reuse the old ID, the physical order and the logical order may be inconsistent because the data is stored on different pages. In this case, the table can be improved by optimizing the table: Optimize table table_name.

2: Sorted – Sorted fields have no index

select id,m_id, name, identity_no, address, create_time, modify_time 
from t1 
order by create_time desc 
limit 10000, 20;
Copy the code

Execution time:

2.015 s elapsed

Select * from T2; select * from T2; select * from T2; select * from T2;

select id,m_id, name, identity_no, address, create_time, modify_time
from t2
order by create_time desc
limit 10000, 20;
Copy the code

Execution time:

0.937 s elapsed

Comparison of execution plans:

1: 2:

If you can see the table with indexes, you can directly run the index to fetch the first N pieces of data without full table scan or filesor.

Conclusion 2: Index commonly used fields, including sorted fields.

New questions:

These two scenarios seem to solve most of the paging problems, but:

  1. Do sorted fields have indexes that are necessarily faster? 1W is faster. How about querying data after 100W?
  2. What if the current table already has multiple indexes and is not suitable for adding indexes?

3: Sorted fields have indexes, but are paged a little deeper: Take 20 bars starting at 100W

select id,m_id, name, identity_no, address, create_time, modify_time 
from t2 
order by create_time desc 
limit 1000000, 20;
Copy the code

Time: Very slow

18.350 s elapsed

Execution plan:

Through the execution plan, it is found that there is no index walk. Why is there no index walk?

Because the mysql optimizer finds that the number of rows in the SQL query exceeds a certain percentage (said to be 30%, but not exactly), it automatically converts to a full table scan.

Yes, force index(idx).

4: Mandatory index

select id,m_id, name, identity_no, address, create_time, modify_time 
from t2 
force index(idx_create_time) 
order by create_time desc 
limit 1000000, 20;
Copy the code

Execution plan after mandatory index:

Look at the execution time:

15.197 s elapsed

It is effective, but the effect is not obvious. Even if the index is forced, mysql needs to fetch 100W + pieces of complete data, which is very resource consuming. It needs to read a large number of index pages, frequently return to the table and other random IO.

Conclusion 3: Even with indexes, deeper pages are problematic and should be avoided.

None of the above attempts is a good solution to the deep paging performance problem. Is there a better solution?

There are!

5: Take last_Conditions of the query

select id,m_id, name, identity_no, address, create_time, modify_time
from t2
where id > #{last_id},create_time > #{last_create_time}
order by create_time desc
limit 0, 20;
Copy the code

The performance is the same as normal shallow pagination, but only if the last_* field is indexed.

At the same time, this scheme is limited by the use of scenarios, such as page hopping, multiple sorting fields, and so on, last_* will not be used.

Recommended application scenarios: Applications with no page number, such as sliding to the next page or having the next page button.

6: Join table subquery

Change the forced query SQL in scenario 4 to a subquery, and first test the T2 table with the sorted field index.

select id,m_id, name, identity_no, address, create_time, modify_time from t2 force index(idx_create_time) order by create_time desc limit 1000000, 20; - changed to:  SELECT id, m_id, NAME, identity_no, address, create_time, modify_time FROM t2 JOIN ( SELECT id FROM t2 ORDER BY create_time desc LIMIT 1000000, 20 ) x USING ( id );Copy the code

Elapsed time: 0.742s Elapsed

The effect is obvious. (original SQL execution duration: 15s+)

Test t1 with create_time and no index.

-- execute at T1:  SELECT id, m_id, NAME, identity_no, address, create_time, modify_time FROM t1 JOIN ( SELECT id FROM t1 ORDER BY create_time desc LIMIT 1000000, 20 ) x USING ( id );Copy the code

Elapsed time: 2.866s Elapsed

The effect is obvious. (SQL > execute time: 18s+)

Changing to a subquery association saves a lot of time with or without an index. Here’s why.

Execution plan:

The difference between the execution plans of the two associated queries is whether the subqueries use index sort. 1 uses index so it is faster.

Comparing subquery and non-subquery execution plans:

The difference between:

What is the difference between a full table scan and a forced index?

It looks like we just added a Using index. What is a Using index?

In short, the value of the query field can be obtained directly through the index tree, so the reason is that the subquery method reduces the table query operation, which reduces the large amount of data back TABLE IO, so it is more efficient.

T1 without index:

The difference between:

At first glance, there is no difference between the two queries. Not only is there no difference, the subquery is more complex than the direct query, but it is faster. Why?

The key here is actually Using filesort.

When Using filesort, mysql has two sorting policies.

One, single way sort

  1. All query field data is fetched into the SORT buffer based on the condition.
  2. When the buffer is full, perform a sort (quicksort) based on the sort field and then write the sorted data to a temporary file.
  3. After all data is retrieved and sorted, all temporary files are merged in order (merge sort) and then written back to the file until all files are merged.
  4. Returns the data required to meet paging conditions read from a temporary file, or directly (shallow paging) if paging data can be fetched from the first merge.

Dual sorting

  1. The ROW_ID and sort fields are fetched and placed into the sort buffer based on the query criteria (difference 1).
  2. When the buffer is full, perform a sort (quicksort) based on the sort field and then write the sorted data to a temporary file.
  3. After all data is retrieved and sorted, all temporary files are merged in order (merge sort) and then written back to the file until all files are merged.
  4. The row_ID that satisfies the paging condition is read from the temporary file, and the corresponding row data is read from the row_ID (difference 2).

SQL > select * from max_length_for_sort_data where max_length_for_sort_data = max_length_for_sort_data;

The subquery only uses create_time+ ID to sort buffer. Compared with direct query, the subquery saves most fields and reduces a large number of temporary file I/O operations. Therefore, the query efficiency is improved.

Another method is to resize sort_buffer_size and compare it up and down.

After the adjustment, the test did not show a significant effect on personal computers. Through online information can be improved, but this method can only be used as icing on the cake, not as a deep paging optimization scheme.

conclusion

contrast

Business direction

  • Can refer to Google/Baidu search page, each time can only jump to the current page before and after 10 pages, that is, can jump up to 10 pages, in order to achieve deep paging situation needs patience.
  • If the front end has no page number and does not support page hopping, use the last_* mode.

Technical direction

  • Adds primary key sorting to paging queries that have no sorting criteria
  • Index sorted fields as much as possible
  • Force dual-way sorting (two queries via subqueries or code) when the number of pages reaches a certain threshold, with or without an index
  • Increase sort_buffer_size appropriately
  • In the case of federated indexes, avoid using them across columns

The text/Jacy

Focus on object technology, hand in hand to the cloud of technology