In our daily development, paging query is essential, it can be said that most of the time every dry back-end program is CURD, so paging query is also in contact with a lot of you are how to achieve it? Recently for a period of time, one of my colleagues suddenly find me for help, he said that he write an SQL query is too slow, can ask me to help him about optimizing the query statements, after a period of optimization, our success will be the original 8 seconds a SQL optimization in less than a second, success of knowledge should learn to share, however, So I’m going to write out this optimization process today, so that more programmers can see it.

Why did you want to optimize paging queries

Colleague: Hi boy, let’s discuss our ideals together. You listen with grass, I feel something wrong, definitely not good, walk past a look, and colleagues told me that he maintained this piece of log data table have been more than 500 w, is likely to be colleagues doing stress testing, led to the database log records immediately broke through millions of users, and the now, colleagues write paging query speed has been slowly slowly, The later the query record is, the longer the query time is, and sure enough, it wasn’t long before we saw a bug in the zen path: after viewing 1000 pages, the log returned extremely slowly, which was forced to optimize ah, it was embarrassing.

How to optimize

The problem is found, that is because when the database store record is too large, the query is also the lower the record speed, why the query is the lower the record speed?

Sqlserver, Oracle can refer to the idea, or return to the previous question, let’s see what query is slow? We need to implement ready tables and records.

Create table

DROP TABLE IF EXISTS `user`;

CREATE TABLE `user` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `u_name` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'Username'.`u_password` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL COMMENT 'password'.`u_mail` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'email'.`u_phone` varchar(11) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'mobile phone'.`u_sex` tinyint(1) DEFAULT NULL COMMENT 'gender'.`u_headImg` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'avatar',
  PRIMARY KEY (`id`))ENGINE=InnoDB AUTO_INCREMENT=4762599 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Copy the code

Add data

I have added users of 100W Zhang SAN, 100W Li Si, 100W Wang Wu and 200 Zhao Liu in the data table, so there are 500W data in the table. The stored procedure is used to add data.

BEGIN
	declare i int;
    set i=1;
    while i2000000 do
       insert into `user`(u_name,u_password,u_mail,u_phone,u_sex,u_headImg) values('Daisy'.'000000'.'[email protected]'.'18800000000'.0.'oss.file.com/images/zhaoliu.png');
        set i=i+1;
    end while;
END
Copy the code

The cause of the slow query

Now that we have our tables and records ready, we need to check why the lower the page number is, the slower the query speed is. Let’s look at a common paging query:

select * from user  order by id desc  limit 100.10;
Copy the code

This is the first ten pages of data query, I believe that most of the people in writing the paging query were written so, of which 100: offset, it means that where is I need according to the data, after 10: indicates how much need to query the records, this is a mysql paging query syntax, what problems can you see the SQL? At first glance, it seems like nothing is wrong, but is it? Let’s look at a couple of examples.

1: offset =0

select * from user  order by id desc  limit 0.10;
Copy the code

The query time

2. Offset =1000

select * from user  order by id desc  limit 1000.10;
Copy the code

The query time

3. Offset =10000

select * from user  order by id desc  limit 10000.10;
Copy the code

The query results

4. Offset = 400W

select * from user  order by id desc  limit 4000000.10;
Copy the code

Query result [Failed to save the external link image, the source site may have an anti-theft mechanism, you are advised to save the image and upload it directly (img-YEqgpuG3-1584850193772)

Limit paging principle

Why is it slow? Let’s take a guess. Things like functions, too many scans, and so on can affect the speed of the query. Obviously we didn’t use functions here, so could it be too many scans? So this has to do with limit, do you know how limit does pagination? We use the wxplain keyword to print the query details for offsets =0, 1000, 10000, and 400W, respectively.

1: offset =0

2. Offset =1000

3. Offset =10000

4. Offset = 400W

Id: identifier select_type: query type. Table: matching partitions for table partitions Type: connection type of the table Possible_keys: possible index key: actual index key_len: length of the index field ref: comparison between columns and indexes rows: number of scanned rows Filtered: indicates the percentage of rows filtered by table criteria Extra: describes the execution status

If we compare the above information, there is only one field that has a significant difference in value, and that is rows: number of rows scanned. The larger the offset of the limit page is, the more rows are scanned, which is why the further back the data is queried, the slower it is.

Limit scan 1000010 rows, discard the first 100W rows and return the last 10 rows to us. Limit scan 1000010 rows, discard the first 100W rows and return the last 10 rows.

;1. Maximum ID query method

What does scanning mean? For example, if I query the first page with limit 0 and the last query id of 10 is 10, then the next page only needs to query the 19 entries whose IDS are greater than 10.

explain select * from user where id > 4000000  limit 10;
Copy the code

2.BETWEEN … AND

select * from user where id BETWEEN 4000000 and 4000010
Copy the code

[ImG-ID5E6QK3-1584847810565]

3.limit id

select * from user where id > (select id from user   limit 4000000.1) limit 10;
Copy the code

4. Delayed association (personal recommendation)

Mysql can scan as few records as possible, get the records that need to be accessed, and then go back to all columns required by the remote table query according to the associated column.

select  * from user INNER JOIN( select id from  user  limit 4000000.10 ) as a USING(id)
Copy the code

5. Query by table

Mysql recommends that a table store not more than 500W of data, 400W less than 1 second for the general query is ok, if it is faster, I recommend using split table storage, and two cases, horizontal split table and vertical split table.

The level of table

If the original data of a table has 1000W pieces of data, I can store it in three tables, one table has 3 million, so the pressure of query will be much less, and the efficiency is also very high, then the question comes, how to achieve this level of water meter? For example, middleware such as MyCAT can be used. Ali Cloud also provides the table splitting technology of the database. Of course, you can also write the table by yourself, but when you write the table by yourself, you need to pay attention to id repetition and how to define and match the current ID in that table.

The vertical table

If Mr Cheung yuen biao record of 100 w, according to the normal query speed should not be too slow, but due to the field more than this form, and there are a lot of type text fields, this time we can take up the space is relatively small fields in a table, take up the space is larger field in another table, two tables one by one, in this way, It will be a lot faster to query.

Hot and cold table

Here I have a sub-table idea, can use for reference, that is hot and cold table.

When is the heat and cold meter? If you all use the bank app, when you check the bill, you can only check the data of recent months, the previous data need to go to the counter to check the history of the bill, he here is the design idea of the hot and cold meter.

Let’s create two identical tables. One table stores the records of the last three months (the time depends on the situation, not three months at the same time). Table A and the other table stores the data of the previous three months: B table, users of the new record can be stored in a table, can regularly scan a table in the morning every day, as long as the records already in the three months before, we can record migrated to b in the table, for users to query nearly three months of data when they are sensitive, hungry data of three months before they can query is not much, So it makes perfect sense.

The index

This believe we all know, adding indexes can improve the efficiency of query, how we paging query involving condition, we can add index to conditions, the database will maintain a corresponding index table, the query will first query indexing table, according to the index table returns the record query form directly, it also decreases the number of scan lines, but need to pay attention to, The index may not be triggered at any point, so be careful.

1. Set the query condition to is not NULL. 2. Like statements such as keyword like ‘% notebook ‘, index invalid, % cannot be the first. 3. If one of the conditions before and after OR does not add an index, the whole table is scanned and the index is invalid. 4. Composite index: When using a composite index, you must add the field of the first index; otherwise, the composite index does not take effect.

  1. <, <>.

  2. A string without single quotes. 7. Etc.

conclusion

To sum up, the focus of the query optimization is the least how can scan, return to the results of the query, looks easy, but really do find is not easy, for write the backend application apes, SQL is home cooking, is also indispensable dish, because SQL to write good or not directly decides the your application of compressive ability is not strong, At this point, you might say that I can use caching to reduce database access. That’s just treating the symptoms, not the root cause. Sount (*); sount(*); sount(*); sount(*); sount(*) This optimization is much simpler than limit optimization, which I won’t explain here.