How to scientifically crack slow SQL

Today and you talk about a common problem: slow SQL.

Here’s what you’ll learn in this article:

Hazards of slow SQL
The execution process of SQL statements
Storage engines and indexes
Slow SQL solution

The following are all MySQL default storage engine InnoDB as an example to expand, no more words, open!

1. Hazards of slow SQL

Slow SQL is an SQL statement that runs very slowly, you might ask what’s wrong with slow SQL?

Consider a scenario:

Baymax and Black Dragon Boat Festival out to play, the ticket is too expensive so they bought a high-speed train, the train station is really wuyang Wuyang. Immediately check-in, big white and small black ready to go to the toilet to clean up the inventory, pit is not much, the queue is really many people. Hei found that passengers in 3 pits were slow, and the other 2 pits had several waves of people, but the 3 pit owners did not come out. Everyone waiting outside, the heart is very uncomfortable, long-term occupation of public resources, the people behind can not use. Small black wry smile way: this is toilet version of slow SQL!

This is a real life example, and it’s the same when you switch to a MySQL server. After all, technology comes from life.

MySQL server resources (CPU, IO, memory, etc.) are limited, especially in high-concurrency scenarios, requests need to be processed quickly. Otherwise, slow SQL will block many normal requests, resulting in a large number of failures or timeouts.

2.SQL statement execution process

Introduction to the interaction between the client and MySQL server:

The client sends an SQL statement to the server. The connector on the server authenticates the account, password, and permission first. If any exception occurs, the server rejects the request.
The server queries the cache and returns the result if the SQL statement hits the cache, otherwise it continues processing.
The server performs lexical parsing, syntax parsing and preprocessing to check the validity of SQL statements.
The server optimizes the previously generated parse tree through the optimizer to generate the optimal physical execution plan.
The generated physical execution plan invokes the relevant interface of the storage engine for data query and processing.
The result is returned to the client when the processing is complete.

Interaction between the client and MySQL server:

As the saying goes, “all roads lead to Rome”, and the optimizer’s job is to find the best one out of many.

The storage engine is the core component that determines SQL execution, and it is useful to understand how this works.

3. Storage engines and indexes

3.1 Storage Engine

InnoDB Storage Engine is the default for MySQL, so it’s typical.

The main function of a storage engine is to access and retrieve data, and it is the component that actually executes SQL statements.

InnoDB’s overall architecture is divided into two parts: memory architecture and disk architecture, as shown in the figure:

Storage engines cover a lot more than one article can cover, but we only need to understand the general composition of memory and disk architectures here.

InnoDB engine is row oriented, data is stored in disk data pages, which store each row in a fixed row format.

The row format is classified into four types: Compact, Redundant, Dynamic, and Compressed. The default format is Compact.

Disk prefetch mechanism and locality principle

When a computer accesses a data page, it not only loads the data page where the current data resides, but also loads adjacent data pages to the memory. The disk prefetch length is generally an integer multiple of the page, effectively reducing the number of DISK I/OS.

Disk and memory interaction

MySQL disk data needs to be swapped to memory to complete an SQL interaction, as shown in the following figure:

A sector is the basic unit of disk read and write. The size of each sector is 512 BYTES
Disk block the smallest unit of read and write data of a file system. Adjacent sectors are grouped together to form a block, usually 4KB
Pages are the smallest storage unit of memory and are typically 2^ N times the size of a disk block
The default size of InnoDB pages is 16KB, which is several times the size of operating system pages

Random disk IO

MySQL data is stored in rows on disk, and the data is not physically contiguous, so it is impossible to find data without randomly reading and writing data to disk.

For MySQL, when there is a lot of random DISK I/O, most of the time is wasted on the seek path, the disk grates, just doesn’t transfer much data.

A disk access consists of three actions:

Seek: Move the head to locate the specified track
Rotate: Waits for the specified sector to rotate past under the head
Data transfer: The actual transfer of data between disk and memory

How to effectively reduce random IO is a very important problem for storage engines.

3.2 the index

There are many data structures that can be added, deleted, modified and searched, including hash table, binary search tree, AVL, red-black tree, B tree, B+ tree, etc., which are all candidate data structures that can be used as indexes.

Considering the reality of MySQL: disk and memory interaction, random disk I/O, sorting and range lookup, add, delete, and change complexity, the B+ tree stands out.

As a multi-fork balanced tree, B+ trees support both range lookup and sorting well, and are shorter and fatter. The average number of disk I/OS to access data depends on the height of the tree, so B+ trees can reduce the number of disk lookups.

In InnoDB, the height of the B+ tree is usually 2 or 4 levels, and the root node is resident in memory, which means that it only takes 1 or 3 disk I/O operations at most to find a row record of a value.

MyISAM stores data and indexes separately. InnoDB storage engine does not store data and indexes separately. This is why some people say that InnoDB index is data and data is index.

When it comes to InnoDB data and index storage, there is a term: clustered index.

Clustered index

A clustered index is a special index that every Innodb table has. It is typically a B+ tree built with the primary key of the table.

Select * from student; select * from student; select * from student; select * from student;

Non-leaf nodes store no data, only primary keys and related Pointers
Leaf nodes contain primary keys, row data, and Pointers
Bidirectional Pointers are connected in series between leaf nodes to form an ordered bidirectional linked list, and the interior of leaf nodes is also ordered

Clustered indexes are created according to the following rules:

InnoDB uses primary keys when there are primary keys
Without a primary key, InnoDB selects a non-empty unique index to create
InnoDB implicitly creates an increment column for non-null unique indexes with no primary key

If we want to find data with id=10, the general process is as follows:

The root of the index is in memory, 10>9 so find P3 pointer
The data pointed to by P3 is not in memory, so 1 disk I/O reads disk block 3 into memory
A binary search is performed on disk block 3 in memory to find all values of ID=9

Nonclustered index

Leaf nodes of non-clustered indexes store secondary index values and primary key values. Neither leaf nor leaf nodes store entire rows of data values.

If we have a student table with name as the secondary index, then the B+ tree structure of the non-clustered index is shown in the following figure:

Since the leaf node of the non-clustered index does not store row data, if the non-clustered index is used to find the non-secondary index value, it needs to be divided into two steps:

First: the primary key corresponding to the data row is determined by the leaf node of the non-clustered index
Second: the corresponding row is queried in the clustered index by the corresponding primary key value

We call the process of finding the primary key from a non-clustered index and then finding the correct row from the clustered index based on the primary key: back table query.

In other words: Select * from student where name = ‘Bob’ will result in a table-back query because there are no other values in the leaf node of the name index, which can only be obtained from the clustered index.

So if the search field can be done in the non-clustered index, you can avoid a return to the table process, this is called: overwrite index, so select * is not a good habit, just take what you need.

If we want to find all the values of the record name=Tom, the process is as follows:

Starting with the non-clustered index, the root node finds the P3 pointer in memory in lexicographical order of name
The disk block to which the P3 pointer points is not in the memory, causing one disk I/O load to the memory
An in-memory search of the data on disk block 3 yields a primary key of 4 for the record name= Tom
Get the P2 pointer from the root node of the clustered index based on the primary key value 4
The disk block to which the P2 pointer points is not in the memory, causing the second disk I/O load to the memory
The data obtained in the previous step is searched in memory to obtain all row data

The above query contains a return to the table process, so the performance is twice as slow as the primary key query, so try to use the primary key query, a complete.

4. Slow SQL solution

There are many reasons for slow SQL, we put aside the single table hundreds of millions of records and no index of the special case, to discuss some of the more general meaning of slow SQL causes and solutions.

We will elaborate from two aspects:

The database table index is incorrectly set
The SQL statement is faulty and needs to be optimized

4.1 Index Setting Rules

The programmer’s perspective is different from the storage engine’s. Indexes are written well, SQL runs fast.

Index differentiation is low

If there are 1000W records in the table, and the status field represents the status, perhaps 90% of the data status=1, status can not be used as the index, because the distinction between data records is very low.

Do not create too many indexes

Each index consumes disk space and is updated when table data is modified. The more indexes there are, the more complex the update will be.

Ibd files need to maintain an additional B+Tree index Tree for each index. If a table has 10 indexes, 10 B+ trees need to be maintained, which reduces write efficiency and wastes disk space.

Common query fields are indexed

If a field is frequently used for query conditions, the query speed of this field affects the query speed of the entire table. Therefore, it is necessary to create an index for this field.

Always sort/group/de-index fields

For fields that require frequent use of operations such as ORDER BY, GROUP BY, DISTINCT, and UNION, you can effectively use the B+ tree feature to speed up execution.

Build indexes for primary and foreign keys

Primary keys can be used to create clustered indexes, and foreign keys are also unique and commonly used for table association fields that need to be indexed to improve performance.

4.2 SQL optimization

If the index of a database table is set properly, improper SQL statement writing may cause index failure or even full table scan, rapidly reducing the performance.

Index of the failure

Index invalidation occurs in some cases when writing SQL:

Use functions for indexes

select id from std upper(name) = ‘JIM’;

Perform an operation on an index

select id from std where id+1=10;

Use <>, not in, not exist,! For indexes =

select id from std where name ! = ‘jim’;

Perform a leading fuzzy query on an index

select id from std name like ‘%jim’;

An implicit conversion will result in an index not being walked

Select id from STD name = 100; Keep variable types consistent with field types

Or joins for non-indexed fields

Not all ORs will invalidate the index. If all fields connected to an OR are indexed, the index will be removed. If a field does not have an index, a full table scan will be performed.

The union index contains only composite index non-leading columns

The SQL statement does not use key1. The SQL statement does not use key1. The SQL statement does not use key1. select name from table where key2=1 and key3=2;

Good advice

Use joins instead of subqueries

For the database, in most cases, the connection is faster than the subquery, use the connection way of MySQL optimizer generally can generate better execution plans, more efficient query processing And the subquery often need to run the repetitive queries, sub queries generated temporary table no index, so the efficiency will be lower.

Optimization of excessive LIMIT offset

The offset of paging query is too large, for example, limit 100000,10

Use overwrite indexes to reduce SELECT * With overwrite indexes, reduce the number of queries back to the table.
In associated query with multiple tables, the small table comes first and the large table comes last

In MySQL, the associated query of tables after the execution of from is performed from left to right. The first table will involve a full table scan, so the scan of small tables first will be faster and more efficient. After the scan of large tables, maybe only the first 100 rows of large tables will meet the return condition and return.

Adjust the join order in the Where sentence

MySQL parses where clauses from left to right to narrow the result set as quickly as possible.

Use small scale transactions instead of large scale transactions
Follow the leftmost matching rule
Use federated indexes instead of creating multiple individual indexes

4.3 Analysis of slow SQL

SQL > select * from ‘MySQL’;

Enable slow SQL logging
Example Set the execution time threshold for slow SQL

SET GLOBAL slow_query_log = 1; Open: SHOW VARIABLES LIKE ‘%slow_query_log%’; SET GLOBAL long_QUERy_time =3; Check thresholds: SHOW GLOBAL VARIABLES LIKE ‘long_query_time%’;

Analysis the explain SQL

Explain commands need only be added before select, for example:

explain select * from std where id < 100;

This command displays the detailed execution process of SQL statements to help us locate problems. There are many usages and explanations about explain on the Internet. This article will not expand.

5. Summary

This paper expounds the harm of slow SQL, Innodb storage engine, clustered index, non-clustered index, index failure, SQL optimization, slow SQL analysis and so on.

MySQL many knowledge points are very complex, not one or two articles can explain clearly, so this article is very thin in many places, fortunately there is a lot of information online.

If this article can enlighten the reader in some way, it will be enough.