Background After work happily sitting on the subway home, thinking about how to arrange the weekend life.

Suddenly the phone rang up, a look is one of our development students, immediately nervous up, this week’s version has been released, this time to call generally is online problems.

Sure enough, the communication situation was that a data query interface on the line was called crazy and irrational, which directly caused the MySql cluster on the line to be slowed down.

Well, this problem is serious, get off the subway hurried home, open the computer, with colleagues to Pinpoint on the slow query log out. I see a very strange query, as follows

POST domain/v1.0 / module/method? order=condition&orderType=desc&offset=1800000&limit=500Copy the code

Domain, Module, and method are pseudonyms that represent the domain, module, and instance method names of the interface. The offset and limit are the offset and number of pages per page, meaning that the student is turning to page (1800000/500+1=3601). A preliminary search of the log found that there are more than 8000 such calls.

This is amazing, and the number of pages on our page is not 500, but 25 per page. This is definitely not an artificial page turning operation on the functional page, but the data is being flushed (note, our production environment has 100 million + data). A detailed comparison of logs shows that many paging times overlap, and the other side should be multithreaded calls.

Through the analysis of the authentication Token, we found that the request came from a client program called ApiAutotest, and the account that generated the authentication Token came from a QA student. I immediately called my classmates to communicate and deal with them.

Analysis of the

Actually for our MySQL query statement, the overall efficiency or can be, the league table query optimization has, also have the short queries, field and sorting the key conditions some indexes are also, the problem is that he paging to queries from page to page, check behind the more the number of pages, scan data, the more will be slow.

When we looked at the first few pages, it was very fast, like limit 200,25, and it was out in a flash. But the further you go, the slower you get, especially after a million, and you get stuck. So how does that work? Let’s take a look at what the query looks like when we turn the page to the end of the page:

Select * from t_name where c_name1=' XXX 'order by c_name2 LIMIT 200000025;Copy the code

This query is slow because the offset after the limit is too large. For example, the above limit 2000000,25 is equivalent to the database scanning 2000025 pieces of data, and then discard the previous 2000000 pieces of data, and return the remaining 25 pieces of data to the user. This method is obviously unreasonable.

If you look at High Performance MySQL, chapter 6: Query Performance Optimization, you will see that paging is usually done using a limit plus an offset, with an appropriate Order BY clause. But there is a common problem with this: when the offset is very large, it causes MySQL to scan a large number of rows that are not needed and then discard them.

The data simulation

Well, now that we know how the problem works, let’s try to solve it. With respect to data sensitivity, let’s simulate this situation here and construct some data for testing

Create two tables: employee table and department table

*/ drop table if EXISTS DEp; create table dep( id int unsigned primary key auto_increment, depno mediumint unsigned not null default 0, depname varchar(20) not null default "", memo varchar(200) not null default "" ); */ drop table if EXISTS EMp; create table emp( id int unsigned primary key auto_increment, empno mediumint unsigned not null default 0, empname varchar(20) not null default "", job varchar(9) not null default "", mgr mediumint unsigned not null default 0, Hiredate datetime not null, sal decimal(7,2) not null, comn decimal(7,2) not null, depno mediumint unsigned not null default 0 );Copy the code

Create two functions: generate a random string and a random number

DELIMITER $drop FUNCTION if EXISTS rand_string; DELIMITER $drop FUNCTION if EXISTS rand_string; CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255) BEGIN DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmlopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'; DECLARE return_str VARCHAR(255) DEFAULT ''; DECLARE i INT DEFAULT 0; WHILE i < n DO SET return_str = CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1)); SET i = i+1; END WHILE; RETURN return_str; END $ DELIMITER; */ DELIMITER $drop FUNCTION if EXISTS rand_num; CREATE FUNCTION rand_num() RETURNS INT(5) BEGIN DECLARE i INT DEFAULT 0; SET i = FLOOR(100+RAND()*10); RETURN i; END $ DELIMITER;Copy the code

3. Write stored procedures to simulate 500W of employee data

DELIMITER $DROP PROCEDURE if EXISTS insert_emp; CREATE PROCEDURE insert_emp(IN START INT(10),IN max_num INT(10)) BEGIN DECLARE i INT DEFAULT 0; /*set autocommit =0; /*set autocommit =0; REPEAT SET i = i + 1; INSERT INTO emp(empno,empname,job,mgr,hiredate,sal,comn,depno) VALUES ((START + I), rand_string (6), 'SALEMAN, 0001, now (), 2000400, rand_num ()); UNTIL i = max_num END REPEAT; COMMIT; END $ DELIMITER; /* insert 500W data */ call insert_emp(0,5000000);Copy the code

4. Write stored procedures to simulate the department data of 120

DELIMITER $DROP PROCEDURE if EXISTS insert_DEPT; CREATE PROCEDURE insert_dept(IN START INT(10),IN max_num INT(10)) BEGIN DECLARE i INT DEFAULT 0; SET autocommit = 0; REPEAT SET i = i+1; INSERT INTO dep( depno,depname,memo) VALUES((START+i),rand_string(10),rand_string(8)); UNTIL i = max_num END REPEAT; COMMIT; END $ DELIMITER; /* insert insert_dept(1,120);Copy the code

5. Create indexes for key fields. In this case, create indexes after running data, which will take a long time to create indexes, but run data will be faster.

/* CREATE INDEX idx_emp_id ON emp(id); /* CREATE INDEX idx_emp_id ON emp(id); CREATE INDEX idx_emp_depno ON emp(depno); CREATE INDEX idx_dep_depno ON dep(depno);Copy the code

Test test data

/* The offset is 100, Take 25 * / SELECT a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno order by Anderson, d desc limit engl j med 100; /* The offset is 4800000, Take 25 * / SELECT a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno order by Anderson, d desc limit engl j med 4800000;Copy the code

The execution result

[SQL] SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname from emp a left join dep b on a.depno = b.depno order by Anderson, d desc limit engl j med 100; Affected row: 0 time: 0.001 s [SQL] SELECT a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno Order by a.id desc limit 4800000,25; Affected row: 0 time: 12.275sCopy the code

Because of the amount of data scanned, this is obviously not an order of magnitude time consuming.

The solution

Since we have the primary key ID and have an index on it, we can first find the id at the starting position in the index tree and then query the row data based on the id value found.

/* select * from id where id = 100; In the position to take back 25 * / SELECT a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno Where a.id >= (select id from emp order by id limit 100,1) order by a.id limit 25; /* subquery to obtain the id with offset 4800000, In the position to take back 25 * / SELECT a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno Where a.id >= (select id from emp order by id limit 4800000,1) order by a.id limit 25;Copy the code

The execution efficiency has been greatly improved compared with the previous:

[SQL] SELECT a.empno,a.empname,a.job,a.sal,b.depno,b.depname from emp a left join dep b on a.depno = b.depno where a.id >= (select id from emp order by id limit 100,1) order by a.id limit 25; Affected row: 0 time: 0.106 s [SQL] SELECT a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno Where a.id >= (select id from emp order by id limit 4800000,1) order by a.id limit 25; Affected row: 0 time: 1.541sCopy the code

2. Start position redefinition Remember the primary key position of the last lookup, and avoid using offset

/* Select * from '100' where id = '100' where id = '100'; From 101 began scanning table * / SELECT Anderson d, a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno where a.id > 100 order by a.id limit 25; /* Remember that the last data in the last page was 4800000, so skip to 4800000. From 4800001 began scanning table * / SELECT Anderson d, a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.depno where a.id > 4800000 order by a.id limit 25;Copy the code

The execution result

[SQL] SELECT a.id,a.empno,a.empname,a.job,a.sal,b.depno,b.depname from emp a left join dep b on a.depno = b.depno where a.id > 100 order by a.id limit 25; Affected row: 0 time: 0.001 s [SQL] SELECT Anderson d, a.e mpno, a.e mpname, a. ob, a.s al, b.d epno, b.d epname from emp a left join dep on a. d. b epno = b.d epno  where a.id > 4800000 order by a.id limit 25; Affected row: 0 time: 0.000sCopy the code

This is the most efficient, no matter how the paging, the time is basically the same, because after the execution of the condition, only 25 data scan.

The catch is that it only works on page by page, so you can remember the last Id of the previous page. If the user skips the page, say, after scrolling through page 25, and immediately jumps to page 35, the data will be wrong.

This suitable scenario is similar to Baidu search or Tencent news scroll down, constantly pull constantly loading. This lazy loading ensures that the data does not jump.

Set the limit offset and the number of bytes to a maximum value. If the maximum value is exceeded, empty data will be returned.

Because he thinks that beyond this point you are not paging, you are scrolling, and if you are looking for data, you should enter the appropriate criteria to narrow it down, rather than paging page by page.

This is roughly the same as my colleague’s idea: request should return an error of 4xx if the offset value is greater than a certain value.

That night we apply the third scheme above, making a lower bound flow for offset, and returning a null value above a certain value. The next day, the program and database scripts were further optimized using the first and second solutions.

It makes sense to do any feature with an extreme in mind, and design capacity to cover extreme boundary testing.

In addition, some current limiting, demoting should also be taken into account. For example, the tool multithreaded call, in a short time frequency of 8000 calls, can use the count service to judge and feedback the user calls too frequently, directly give the broken.

Ah, careless ah, engage in the middle of the night, QA students do not speak martial arts. But it was a wonderful experience.