MySQL deep paging problem and optimization scheme: How to quickly page tens of millions of data

preface

In back-end development, in order to prevent the memory and disk IO overhead caused by loading too much data at one time, it is often required to display in pages. In this case, the LIMIT keyword of MySQL is needed. But if you think LIMIT paging is all right, Too young, Too simple, one of the problems LIMIT can cause with large amounts of data is deep paging.

case

As an example, I create a new table to display the details of an e-commerce order:

CREATE TABLE `cps_user_order_detail` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `user_id` varchar(32) NOT NULL DEFAULT ' ' COMMENT 'user ID',
  `order_id` bigint(20) DEFAULT NULL COMMENT 'order id',
  `sku_id` bigint(20) unsigned NOT NULL COMMENT 'commodity ID',
  `order_time` datetime DEFAULT NULL COMMENT 'Order time, format YYYY-MM-DD HH: MM :ss'.PRIMARY KEY (`id`),
   KEY `idx_time_user` (`order_time`,`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT='User Order Details';
Copy the code

And then insert 120 million pieces of data into the manual. Now there is a requirement to display the details of the user’s order in pagination, in reverse order of the order time. Table structures are streamlined and requirements are simple. So finish writing the code in a rush, lift test line. Everything was normal in the early stage, but with the increasing order quantity, the system was found to be more and more slow, and several slow queries were reported from time to time. LIMIT offset is the LIMIT offset problem. Yes, your SQL is not elegant, or MySQL itself. Here I will simply use two SQL examples, as shown in the figure below, which are paged from 100 and 100W respectively. You can see that the time difference is quite large. This is in addition to other data calculation and processing time, a single SQL query can take more than a second, which is intolerable in the functionality provided to users (e-commerce often requires an INTERFACE RT of no more than 200ms).

Here we take a look at the execution plan, as shown in the figure below:

Here are the possible values and meanings of the Extra column of the execution plan:

Using WHERE: Indicates that the optimizer needs to query data back to the table through the index.
Using index: overwriting an index, which means that directly accessing an index is enough to obtain the required data without returning to the table through an index. This method is usually implemented by creating a joint index for fields to be queried.
Using index condition: a new feature added after version 5.6, known as index push-down, is a new MySQL featureReduce the number of times the table is returnedMajor optimization of.
MySQL will recall all data to memory for sorting Using filesort, which consumes more resources.

Looking at the figure above, the same statement has a lot of different execution plans just because of the offset (if I may exaggerate a little). LIMIT 100,6type = range, index (ref); This means that the index is removed after the ORDER time of WHERE, and the ORDER BY is also pushed down BY the index, which is synchronized when the WHERE condition is filtered. Select * from table_name where type = ALL; select * from table_name where type = ALL Using FILESort indicates that ORDER BY has occurred. The first reason is that it takes too long to sort files, and the second reason is that after filtering the relevant data according to the conditions, all values need to be retrieved according to the offset table. Either of the above is the result of a large LIMIT offset, so real development environments often encounter requirements that the magnitude of the non-statistical table must not exceed a million.

To optimize the

So how can LIMIT deep paging be optimized in practice? Here shaoxia gives two plans. One is through primary key index optimization. What does that mean? Change the above statement to:

SELECT * FROM cps_user_order_detail d WHERE d.id > #{maxId} AND d.order_time>'2020-8-5 00:00:00' ORDER BY d.order_time LIMIT 6;
Copy the code

As shown in the code above, it is also paging, but there is a constraint on maxId. What does this mean? MaxId is the maximum primary key Id in the previous page. Therefore, the premise of this method is: 1) the primary key must be self-incrementing and cannot be a UUID, and the front end must carry the maximum Id of each previous page in addition to the basic paging parameters pageNo and pageSize. 2) This method does not support random page hopping, that is, only up and down pages. The picture below shows an actual page from a well-known e-commerce company.

Second, through Elastic Search Search (based on inverted indexing), e-commerce companies like Taobao basically put all their products into ES Search engine (it’s impossible to put so much data into MySQL and impossible to put it into Redis). But even with the ES search engine, it’s possible to have deep paging problems, so what do you do? The answer is through the cursor scroll. This point is not in-depth here, interested can do research.

summary

I am writing this blog because I really experienced it in the development some time ago, and I did discuss it with the interviewer in the interview before. Knowing the limits and optimizations of limits is a plus in an interview, but MySQL optimizations can’t be said to be indexes and SQL tweaks (in fact, these two optimizations have very little effect in real development). After all, if MySQL optimization were so great, there wouldn’t be so much middleware. I am Xiaoxia Lufei. I love technology and sharing.

MySQL deep paging problem and optimization scheme: How to quickly page tens of millions of data

preface

case

To optimize the

summary

Related Posts

Practice + source code analysis to fully understand Sequelize, with detailed examples!!

Dockerfile:ADD VS COPY

Design Patterns (0)- Getting started