1, the LIMIT statement

Paging queries are one of the most common scenarios, but they are also often the most problematic. For example, for simple statements like the following, the typical DBA solution is to add a composite index to the type, name, and create_time fields. In this way, conditional sorting can effectively use the index, and the performance is rapidly improved.

SELECT * 
FROM   operation 
WHERE  type = 'SQLStats' 
       AND name = 'SlowLog' 
ORDER  BY create_time 
LIMIT  1000, 10;

Copy the code
Well, maybe 90% or more dbAs solve this problem and that’s it. But when the LIMIT clause becomes “LIMIT 100000010”, programmers still complain: why is it slow when I only fetch 10 records?

The database does not know where the 1000,000th record starts, and even if it has an index it needs to be evaluated from scratch. When performance problems like this occur, most of the time the programmer is lazy.

The maximum value of the previous page can be used as the query condition in scenarios such as front-end data browsing and page turning or big data export in batches. SQL was redesigned as follows:

SELECT   * 
FROM     operation 
WHERE    type = 'SQLStats' 
AND      name = 'SlowLog' 
AND      create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;

Copy the code
Under the new design, the query time is basically fixed and will not change with the increase of data volume.

2. Implicit conversion

Mismatching of query variable and field definition types in SQL statements is another common error. Such as the following statement:

mysql> explain extended SELECT * 
     > FROM   my_balance b 
     > WHERE  b.bpn = 14000000123 
     >       AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

Copy the code
Where field BPN is defined as vARCHar (20), MySQL’s policy is to convert strings to numbers before comparison. Function on table field, index invalidated.

These may be parameters that the application framework automatically fills in, rather than what the programmer intended. Now a lot of application framework is very complex, easy to use at the same time it may be careful to dig their own holes.

3. Associated update and deletion

While MySQL5.6 introduces materialization, it’s important to note that it’s currently only optimized for query statements. For updates or deletes, you need to rewrite the JOIN manually.

For example, in the UPDATE statement below, MySQL actually executes a DEPENDENT SUBQUERY, which takes as long as you can imagine.

UPDATE operation o 
SET    status = 'applying' 
WHERE  o.id IN (SELECT id 
                FROM   (SELECT o.id, 
                               o.status 
                        FROM   operation o 
                        WHERE  o.group = 123 
                               AND o.status NOT IN ( 'done' ) 
                        ORDER  BY o.parent, 
                                  o.id 
                        LIMIT  1) t);

Copy the code
Execution Plan:

+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-------------------------- ---------------------------+ | id | select_type | table |type| possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------+-------+---------------+---------+---------+-------+------+-------------------------- ---------------------------+ | 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Usingwhere; Using temporary                        |
| 2  | DEPENDENT SUBQUERY |       |       |               |         |         |       |      | Impossible WHERE noticed after reading const tables |
| 3  | DERIVED            | o     | ref   | idx_2,idx_5   | idx_5   | 8       | const | 1    | Using where; Using filesort | +----+--------------------+-------+-------+---------------+---------+---------+-------+------+-------------------------- ---------------------------+Copy the code
When rewritten as JOIN, the SUBQUERY selection mode changed from DEPENDENT SUBQUERY to DERIVED query, and the execution speed was greatly reduced from 7 seconds to 2 milliseconds.

UPDATE operation o 
       JOIN  (SELECT o.id, 
                            o.status 
                     FROM   operation o 
                     WHERE  o.group = 123 
                            AND o.status NOT IN ( 'done' ) 
                     ORDER  BY o.parent, 
                               o.id 
                     LIMIT  1) t
         ON o.id = t.id 
SET    status = 'applying' 

Copy the code
The implementation plan is simplified as:

+----+-------------+-------+------+---------------+-------+---------+-------+------+------------------------------------ -----------------+ | id | select_type | table |type| possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+------------------------------------ -----------------+ | 1 | PRIMARY | | | | | | | | Impossible WHERE noticed after reading const tables | | 2 | DERIVED | o  | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Usingwhere; Using filesort | +----+-------------+-------+------+---------------+-------+---------+-------+------+------------------------------------ -----------------+Copy the code

4. Mix sort

MySQL cannot use indexes for mixed sorting. But in some scenarios, there are opportunities to use special methods to improve performance.

SELECT * 
FROM   my_order o 
       INNER JOIN my_appraise a ON a.orderid = o.id 
ORDER  BY a.is_reply ASC, 
          a.appraise_time DESC 
LIMIT  0, 20 

Copy the code
The execution plan is displayed as full table scan:

+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+ | id | select_type |  table |type   | possible_keys     | key     | key_len | ref      | rows    | Extra    
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
|  1 | SIMPLE      | a     | ALL    | idx_orderid | NULL    | NULL    | NULL    | 1967647 | Using filesort |
|  1 | SIMPLE      | o     | eq_ref | PRIMARY     | PRIMARY | 122     | a.orderid |       1 | NULL           |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+

Copy the code
Since IS_Reply has only 0 and 1 states, we rewrote it as follows, reducing the execution time from 1.58 seconds to 2 milliseconds.

SELECT * 
FROM   ((SELECT *
         FROM   my_order o 
                INNER JOIN my_appraise a 
                        ON a.orderid = o.id 
                           AND is_reply = 0 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20) 
        UNION ALL 
        (SELECT *
         FROM   my_order o 
                INNER JOIN my_appraise a 
                        ON a.orderid = o.id 
                           AND is_reply = 1 
         ORDER  BY appraise_time DESC 
         LIMIT  0, 20)) t 
ORDER  BY  is_reply ASC, 
          appraisetime DESC 
LIMIT  20;

Copy the code

5, EXISTS statement

MySQL still uses nested subqueries for the EXISTS clause. SQL statement as follows:

SELECT *
FROM   my_neighbor n 
       LEFT JOIN my_neighbor_apply sra 
              ON n.id = sra.neighbor_id 
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND EXISTS(SELECT 1 
                  FROM   message_info m 
                  WHERE  n.id = m.neighbor_id 
                         AND m.inuser = 'xxx') 
       AND n.topic_type <> 5 

Copy the code
The implementation plan is:

+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type        | table | type | possible_keys     | key   | key_len | ref   | rows    | Extra   |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
|  1 | PRIMARY            | n     | ALL  |  | NULL     | NULL    | NULL  | 1086041 | Using where                   |
|  1 | PRIMARY            | sra   | ref  |  | idx_user_id | 123     | const |       1 | Using where          |
|  2 | DEPENDENT SUBQUERY | m     | ref  |  | idx_message_info   | 122     | const |       1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+

Copy the code
Changing exists to JOIN can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 ms.

SELECT *
FROM   my_neighbor n 
       INNER JOIN message_info m 
               ON n.id = m.neighbor_id 
                  AND m.inuser = 'xxx' 
       LEFT JOIN my_neighbor_apply sra 
              ON n.id = sra.neighbor_id 
                 AND sra.user_id = 'xxx' 
WHERE  n.topic_status < 4 
       AND n.topic_type <> 5 

Copy the code
New implementation plan:

+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+ |  id | select_type | table |type| possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+ | 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition | | 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1  | Usingwhere      |
|  1 | SIMPLE      | sra   | ref    | | idx_user_id | 123     | const     |    1 | Using where           |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+

Copy the code

6. Push under conditions

External query criteria cannot be pushed down to complex views or subqueries:

  • Aggregate subquery;
  • Subquery with LIMIT;
  • UNION or UNION ALL subquery;
  • Subqueries in output fields;
The following statement, which you can see from the execution plan, applies after the aggregate subquery:

SELECT * 
FROM   (SELECT target, 
               Count(*) 
        FROM   operation 
        GROUP  BY target) t 
WHERE  target = 'rm-xxxx' 

Copy the code


+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table      | type| possible_keys | key | key_len | ref | rows | Extra | + - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 514 | const | 2 | Usingwhere |
|  2 | DERIVED     | operation  | index | idx_4         | idx_4       | 519     | NULL  |   20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+

Copy the code
Make sure that the query conditions can be pushed down semantically and rewritten as follows:

SELECT target, 
       Count(*) 
FROM   operation 
WHERE  target = 'rm-xxxx' 
GROUP  BY target

Copy the code
The execution plan becomes:

+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type| possible_keys | key | key_len | ref | rows | Extra | + - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Usingwhere; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+

Copy the code

7. Narrow down ahead of time

Start with initial SQL statement:

SELECT * 
FROM   my_order o 
       LEFT JOIN my_userinfo u 
              ON o.uid = u.uid
       LEFT JOIN my_productinfo p 
              ON o.pid = p.pid 
WHERE  ( o.display = 0 ) 
       AND ( o.ostaus = 1 ) 
ORDER  BY o.selltime DESC 
LIMIT  0, 15 

Copy the code
Do a series of left joins, then sort the first 15 records. As can be seen from the execution plan, the last step estimated the number of sorting records to be 900,000, and the time consumption is 12 seconds.

+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+-------------------- --------------------------------+ | id | select_type | table |type| possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+-------------------- --------------------------------+ | 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 909119 | Usingwhere; Using temporary; Using filesort | | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | | 1 | SIMPLE | p | ALL | PRIMARY  | NULL | NULL | NULL | 6 | Usingwhere; Using join buffer (Block Nested Loop) | +----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+-------------------- --------------------------------+Copy the code
Since the last WHERE condition and the sort are on the leftmost main table, we can reduce the amount of data in my_order before we do the left join. The SQL is rewritten as follows, and the execution time is reduced to about 1 ms.

SELECT * 
FROM (
SELECT * 
FROM   my_order o 
WHERE  ( o.display = 0 ) 
       AND ( o.ostaus = 1 ) 
ORDER  BY o.selltime DESC 
LIMIT  0, 15
) o 
     LEFT JOIN my_userinfo u 
              ON o.uid = u.uid 
     LEFT JOIN my_productinfo p 
              ON o.pid = p.pid 
ORDER BY  o.selltime DESC
limit 0, 15

Copy the code
Check the execution plan again: subquery materialized (select_type=DERIVED) joins. Although the estimated row scan is still 900,000, the actual execution time becomes very small with the use of indexes and the LIMIT clause.

+----+-------------+------------+--------+---------------+---------+---------+-------+--------+------------------------- ---------------------------+ | id | select_type | table |type| possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+---------------+---------+---------+-------+--------+------------------------- ---------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 15 | Using temporary; Using filesort | | 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | | 1 | PRIMARY | p | ALL | PRIMARY | NULL | NULL | NULL | 6 | Usingwhere; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | o          | index  | NULL          | idx_1   | 5       | NULL  | 909112 | Using where| +----+-------------+------------+--------+---------------+---------+---------+-------+--------+------------------------- ---------------------------+Copy the code

8. Push down the intermediate result set

Consider the following example, which has been preliminarily optimized (primary table priority function query condition in left join) :

SELECT    a.*, 
          c.allocated 
FROM      ( 
              SELECT   resourceid 
              FROM     my_distribute d 
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
LEFT JOIN 
          ( 
              SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
              FROM     my_resources 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

Copy the code
So are there any other problems with this statement? It is not difficult to see that subquery C is a full table aggregation query, which will lead to the performance of the whole statement in the case of a particularly large number of tables.

In fact, for subquery C, the left join result set only cares about the data that can match the main table resourceID. So we can rewrite the statement as follows to reduce the execution time from 2 seconds to 2 milliseconds.

SELECT    a.*, 
          c.allocated 
FROM      ( 
                   SELECT   resourceid 
                   FROM     my_distribute d 
                   WHERE    isdelete = 0 
                   AND      cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
LEFT JOIN 
          ( 
                   SELECT   resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
                   FROM     my_resources r, 
                            ( 
                                     SELECT   resourceid 
                                     FROM     my_distribute d 
                                     WHERE    isdelete = 0 
                                     AND      cusmanagercode = '1234567' 
                                     ORDER BY salecode limit 20) a 
                   WHERE    r.resourcesid = a.resourcesid 
                   GROUP BY resourcesid) c 
ON        a.resourceid = c.resourcesid

Copy the code
But subquery A appears multiple times in our SQL statement. Not only does this have extra overhead, it also makes the entire statement seem cumbersome. Rewrite again using the WITH statement:

WITH a AS 
( 
         SELECT   resourceid 
         FROM     my_distribute d 
         WHERE    isdelete = 0 
         AND      cusmanagercode = '1234567' 
         ORDER BY salecode limit20) SELECT a.*, c. located FROM a LEFT JOIN (SELECT resourcesID,  sum(ifnull(allocation, 0) * 12345) allocated FROM my_resources r, a WHERE r.resourcesid = a.resourcesid GROUP BY resourcesid) c ON a.resourceid = c.resourcesidCopy the code

conclusion

The database compiler generates the execution plan, which determines how the SQL is actually executed. But compilers do their best, and all database compilers are not perfect.

Most of the scenarios mentioned above have performance issues in other databases as well. Understand the characteristics of the database compiler, to avoid its shortcomings, write high-performance SQL statements.

Programmers bring algorithmic ideas or awareness to the design of data models and to the writing of SQL statements.

Get into the habit of using the WITH statement when writing complex SQL statements. Concise and clear SQL statements can also reduce the burden on the database.



The last

Welcome to pay attention to my public number [programmer chasing wind], the article will be updated in it, sorting out the data will be placed in it.