Is MySQL LIMIT so bad

Hashtag: “We are all little frogs” public account article


I’m going to give you a brief description of this question.

The problem

In order for the story to work, we need to have a table:

CREATE TABLE t (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    key1 VARCHAR(100),
    common_field VARCHAR(100),
    PRIMARY KEY (id),
    KEY idx_key1 (key1)
) Engine=InnoDB CHARSET=utf8;
Copy the code

Table T contains three columns, id column is the primary key and KEY1 column is the secondary index column. The table contains 10,000 records.

When we execute the following statement, we use the secondary index idx_KEY1:

mysql>  EXPLAIN SELECT * FROM t ORDER BY key1 LIMIT 1;
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-------+
|  1 | SIMPLE      | t     | NULL       | index | NULL          | idx_key1 | 303     | NULL |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
Copy the code

This makes sense because in the secondary index IDx_KEY1, the KEY1 column is ordered. Select * from idx_KEY1; select * from idx_key1; select * from idx_key1;

LIMIT 1 = 5000, 1; LIMIT 5000, 1;

mysql> EXPLAIN SELECT * FROM t ORDER BY key1 LIMIT 5000, 1; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------- --+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------- -- -- + | | 1 SIMPLE | t | NULL | | NULL ALL | NULL | NULL | NULL | 9966 | | 100.00 Using filesort | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------- --+ 1 row in set, 1 warning (0.00 SEC)Copy the code

LIMIT 5000 (idx_key1) LIMIT 5000 (idx_key1) LIMIT 5000 (idx_key1) LIMIT 5000 (idx_key1) LIMIT 5000 (idx_key1) LIMIT 5000 (idx_key1) LIMIT 5000 (idx_key1)

Unfortunately, due to a bug in MySQL’s implementation, there is no such thing as a full table scan +filesort.

Server layer and storage engine layer

MySQL is divided into server layer and storage engine layer:

  • The Server layer takes care of general things like connection management, SQL parsing, analysis execution plans, and so on

  • The storage engine layer is responsible for the specific data storage, such as whether the data is stored in files or in memory, and what the specific storage format is. We basically use the InnoDB storage engine right now, there are very few other storage engines, so we don’t have any other storage engines.

The execution of an SQL statement in MySQL is achieved through multiple interactions between the server layer and the storage engine layer. For example, the following query:

SELECT * FROM t WHERE key1 > 'a' AND key1 < 'b' AND common_field ! = 'a';Copy the code

The server layer analyzes that the above statement can be executed using the following two schemes:

  • Solution 1: Use full table scan

  • Scheme 2: Use the secondary index IDx_KEY1. In this case, scan all the secondary index records between (‘ A ‘and ‘b’) in the KEY1 column, and perform table-back operations on each secondary index record.

The Server layer analyzes which of the two options is less costly and selects the one with the lower cost as the execution plan. The interface provided by the storage engine is then invoked to actually execute the query.

The assumption here is that scheme two uses the secondary index IDx_KEY1 to perform the above query. The dialog between the server layer and the storage engine layer can look like this:

Select * from idx_KEY1 (‘a’, ‘b’); select * from idx_KEY1 (‘a’, ‘b’);

InnoDB: “received, go to check”, then InnoDB through idx_KEY1 secondary index corresponding B+ tree, quickly locate the scan interval (‘a’, ‘B ‘) of the first secondary index record, and then back to the table, get the full cluster index record back to the server layer.

After the server layer receives the complete cluster index record, it continues to judge common_field! = whether the condition ‘a’ is true. If not, discard the record; otherwise, send the record to the client. And say to the storage engine, “Please give me the next record.”

Tip: The record sent to the client is actually sent to the local network buffer, which is controlled by net_buffer_length and is 16KB by default. Wait until the buffer is full to actually send the network packet to the client.

InnoDB: “Received, I will check”. InnoDB finds the next secondary index record of idx_KEY1 (‘ A ‘, ‘b’) based on the record’s next_record property, and then performs table-back operations to return the complete cluster index record to the server layer.

Tip: Both clustered index records and secondary index records contain a property called next_RECORD. Each record is linked into a linked list based on next_RECORD, and the records in the linked list are sorted by key value (for clustered indexes, the key value is the value of the primary key, for secondary index records, The key value is the value of the secondary index column.

After the server layer receives the complete cluster index record, it continues to judge common_field! = whether the condition ‘a’ is true. If not, discard the record; otherwise, send the record to the client. And say to the storage engine, “Please give me the next record.”

. The process is then repeated over and over.

Until:

That is, until InnoDB finds that the next secondary index record obtained by next_record is not in the (‘ A ‘, ‘b’) interval, it tells the server layer: “Ok, there is no next record in the (‘ A ‘, ‘B ‘) interval.”

The server layer receives a message from InnoDB saying there is no next record and terminates the query.

Now you know the basic interaction between the Server layer and the storage engine layer.

What the hell is LIMIT?

It may surprise you to learn that MySQL processes the contents of the LIMIT clause only when the server layer is ready to send records to the client. Take the edge off statement as an example:

SELECT * FROM t ORDER BY key1 LIMIT 5000, 1;
Copy the code

If the above query is executed with idx_KEY1, MySQL will do this:

  • The server layer asks InnoDB for the first record. InnoDB retrieves the first secondary index record from IDx_KEY1, and then performs table-back operations to obtain the complete clustered index record, which is then returned to the Server layer. The server layer is ready to send it to the client, but there is also a requirement of LIMIT 5000, 1, which means that 5001 of the eligible records can be sent to the client. Let’s assume that the Server layer maintains a variable called limit_count to count how many records have been skipped, so limit_count should be set to 1.

  • The server layer asks InnoDB for the next record, InnoDB finds the next secondary index record according to the next_record property of the secondary index record, and returns the complete clustered index record to the server layer. When sending the limit_count packet to the client, the server layer finds that the limit_count packet is 1. Therefore, the server layer gives up sending the limit_count packet to the client and increases the limit_count packet by 1. In this case, the limit_count packet becomes 2.

  • . Repeat the above steps

  • The server layer doesn’t actually send the full cluster index records InnoDB returns to the client until limit_count equals 5000.

As you can see from the above procedure, since MySQL does not determine whether the LIMIT clause is valid before actually sending records to the client, using secondary indexes to perform the above query would mean 5001 table back operations. Server layer in the execution plan analysis will feel that the cost of executing so many times back to the table is too large, it is not as fast as direct full table scan +filesort, so we choose the latter to execute the query.

How to do?

Due to the limitations of MySQL’s LIMIT clause, there is no way to use secondary indexes to speed up queries on statements such as LIMIT 5000, 1. Not really, just rewrite the above statement as:

SELECT * FROM t, (SELECT id FROM t ORDER BY key1 LIMIT 5000, 1) AS d
    WHERE t.id = d.id;
Copy the code

SELECT id FROM t ORDER BY key1 LIMIT 5000, 1 as a subquery; SELECT id FROM t ORDER BY key1 LIMIT 5000, 1 as a subquery; Table T is then searched based on the primary key obtained in the sub-query.

This eliminates the need for the first 5000 records to return to the table, thus greatly improving query efficiency!

Vomit a slot

When will the guys who designed MySQL change this stupid implementation of the LIMIT clause? The user has to manually trick the optimizer to improve query efficiency

This article was first published on the public account “we are all little frog”, long click to pay attention to the little frog, are dry goods oh