Understand MySQL optimization recommendations in principle

An overview of the

Since learning MySQL, we have heard and seen a lot of optimization advice, such as do not use select * query, use the same field query; It is recommended to use auto-increment primary keys for tables, and so on. These suggestions sound like common sense in MySQL development, but with these optimization suggestions, have we ever wondered why we do them? In this blog post we will start from the principles of MySQL to explain why there are these optimization recommendations.

u

The experimental environment of this article is MySQL 5.7.25

“

Preliminary knowledge

B + tree index

MySQL’s default storage engine, InnoDB, uses B+ trees to store data, so it is important to understand the basic principles of B+ tree indexes before analyzing optimization recommendations.

The diagram above shows a B+ tree index (the definition of a B+ tree can be seen here). Each node represents a disk block, which can also be understood as a page in a database. If I want to query data with primary key 35, what will happen to the index? First, it will judge that 35 is less than the root node 37, continue to query the left subtree, judge that 35 is greater than 22 and 33, then enter the right subtree, find the leaf node 33, continue to traverse to find 35, and finally extract its data. In the case of indexes, query 35 takes only 3 IO operations, which is very efficient. In a real world scenario, a 3-tier B + tree can represent millions of data. If millions of data lookups take only three IO, the performance improvement will be huge. If there is no index and each data item takes one IO, the total will be millions of IO, which is obviously very, very expensive. The figure above also shows that as long as the height of the maintenance tree is low enough, the I/O operations will be small enough, and the NUMBER of I/OS will be small, and the query performance will be high.

Explain execution plan

This is an explian execution plan. What do the fields above mean?

Id: Number of the Query in the execution plan selected by the Query Optimizer.

Select_type: specifies the query type used. There are several query types:

Type the name	instructions
SIMPLE	Queries other than subqueries or UNION queries.
PRIMARY	The outermost query in the subquery,Note that this is not a primary key query.
UNION	SELECT PRIMARY from the first SELECT in the UNION statement.
DEPENDENT UNION	The UNION in the subquery, and all subsequent SELECT from the second SELECT in the UNION, also depend on the result set of the external query.
UNION RESULT	The result of the merge in UNION.
SUBQUERY	The first SELECT of a subquery’s inner query, the result of which is independent of the external query result set.
DEPENDENT SUBQUERY	The first SELECT of a subquery inner query whose results depend on the external query result set.
DERIVED	The driver table is the main table
MATERIALIZED	The results of subqueries are saved as virtual temporary tables
UNCACHEABLE SUBQUERY	A subquery whose result set cannot be cached
UNCACHEABLE UNION	A UNION query whose result set cannot be cached

Table: Displays the names of tables in the database accessed by performing this step.
Partitions: query for matching partitions in partitioned tables and display NULL for non-partitioned tables.

Type: indicates the type used to query the table. The types are as follows:

Type the name	instructions
all	Full table scan.
const	Read constants, at most one record matches, because it is constant, so it really only needs to be read once.
system	System table, which contains only one piece of data, which is of special const type.
eq_ref	At most, only one result will be matched, typically accessed by primary key or unique index.
ref	The index query of the driven table in the Join statement
full_text	Use the full_TEXT index
ref_or_null	The only difference from ref is that a null-value query is added to the indexed query.
index_merge	Use two (or more) indexes in a query, merge the index results, and then read the table data.
unique_subquery	The result field combination in a subquery is either a primary key or a unique index
index_subquery	The result field combination in a subquery is an index, but not a primary key or unique index
range	Index range scan, which often appears in comparison conditions such as <, >,BETWEEN, etc
Index	Full index scan

Their performance from good to bad is: system > const > eq_ref > ref > full_text > ref_or_null > unique_subquery > index_subquery > range > index_merge > index > all.

Possible_keys: Possible index in the query.
Key_len: indicates the length of the index used.
Ref: Shows which columns or constants are compared to hit indexes.
Rows: Number of rows to perform this query scan.
When 100 is displayed, it indicates that no rows are filtered. Rows shows the estimated number of rows checked. Multiplied by the filtered percentage, it shows the number of rows connected to the table below. For example, if the number of rows is 1000 and the filter condition is 50.00 (50%), the number of rows joined with the table below is 1000× 50% = 500.
Extra: Perform query for additional criteria, detailed criteria can be viewed here.

With this knowledge in hand, let’s use this knowledge to analyze some of the MySQL optimizations that we often recommend.

Why is autoincrement primary key recommended

What if every time we create a table we’re thinking about using the table’s auto-increment primary key? Or UUID? But for performance reasons, we still recommend using an incremented Id. Why? MySQL > update B+ tree index; MySQL > update B+ tree index; In other words, whichever is faster and cheaper to update the B+ tree with an autoincrement Id or a non-autoincrement Id is the better choice. Let’s simulate auto-increment Id insertion and non-auto-increment Id insertion.

Auto-increment Id insertion case: We insert two data points, 10 and 11, into a B + number that already has 10 data points, and see how the tree changes.

We can find two characteristics here:

1. The scope of influence of self-increasing data insertion is always only the right-most subtree. Either the node is directly inserted into the subtree, or the subtree is split and its parent node is affected.

2. All subtrees except the rightmost are full.

What are the effects of these two features? According to the previous B+ tree index diagram, each point is a disk block, and the operation of each node is equivalent to an IO. Since the nodes affected by each insertion are only the right-most subtree, the range of disk IO can be controlled. The most important point is that the nodes of all subtrees except the rightmost subtree are full. In this case, the physical continuity of data of leaf nodes is better, and the query performance is also higher according to the principle of locality.

Insert non-increment Id:

Non-auto-increment Id insert features comparison with auto-increment Id insert we can easily know:

1. Insert the influence node is uncontrollable and unpredictable.

2. Every subtree has unsatisfactory leaf nodes.

Following the previous analysis, we can see the performance disadvantages of non-auto-increment Id inserts. Nodes cannot be controlled due to data insertion, leading to more frequent node splitting. Node splitting is also an I/O operation, and performance is affected. If the leaf nodes in the subtree are not satisfied, the physical continuity of the leaf nodes is not good. Finally, if we are UUID, if the Id is too long, the node space will be occupied, the number of nodes per page will be reduced, the page will be split more, and the performance will be affected. This is why it is recommended to use auto-increment primary keys.

Why not use a SELECT * query

We often hear query table, as long as you want to query the field, do not need to query the field, strictly prohibit the use of SELECT *, we can think of a very intuitive reason is that the database to help you translate into each field name to query, then query the excess field will occupy memory, bandwidth and other resources. That’s one reason, and it’s an important one, but I want to talk about another reason, overwriting indexes. My previous index article also covered covered indexes, so those interested can click here. Overwriting an index means that a query overwrites the field to be queried using a federated index so that the database does not have to run back to the table, thereby reducing IO and improving performance.

Here I use MySQL official list data to conduct an experiment, the data address can be downloaded here.

I chose the employees table data for the demonstration. By default, there is no federated index, so let’s add a federated index:

- the employee table structure -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + | Field | Type | Null | Key | Default | Extra | +------------+---------------+------+-----+---------+-------+ | emp_no | int(11) | NO | PRI | NULL | |  | birth_date | date | NO | | NULL | | | first_name | varchar(14) | NO | MUL | NULL | | | last_name | varchar(16) | NO |  | NULL | | | gender | enum('M','F') | NO | | NULL | | | hire_date | date | NO | | NULL | | +------------+---------------+------+-----+---------+-------+Copy the code

ALTER TABLE employees.employees add Index `idx_first_name_last_name` (first_name,last_name);
Copy the code

To check the index of this table, run the show index from employees.employees command.

The table has successfully created matching indexes for first_name and last_name, and profiles is enabled to monitor SQL execution.

SET SESSION profiling = 1;
Copy the code

Then execute the following SQL separately

SELECT first_name,last_name  FROM employees.employees WHERE first_name='Eric';
SELECT *  FROM employees.employees WHERE first_name='Eric';
Copy the code

View Profiles;

Here we see that using the SELECT * field is about 4 times slower than using the SELECT field. Why is this? Let’s look at the execution plan.

Select * from first_name where last_name = ‘last_name’; select * from last_name where last_name = ‘first_name’; select * from last_name where last_name = ‘last_name’; Select * from gender; hire_date; gender; hire_date; gender; hire_date

conclusion

In this article, we analyze two common suggestions in principle. Why do we recommend using autoincrement primary keys? Why not use a select * query? In fact, the main reason is related to index, since we use index to improve our efficiency, we should make full use of it.

1. The efficiency of B+ tree query is affected by the height of the tree. The lower the height of the tree, the fewer I/O query times and the higher the performance.

2. The types of execution plans from best to worst are: system > const > eq_ref > ref > full_text > ref_or_null > unique_subquery > index_subquery > range > index_merge > index > all.

3. The advantage of self-increasing primary keys is continuity, relatively low cost of insertion and maintenance, and most of the leaf nodes of the subtree are full nodes, which has good physical continuity and better query performance.

4. The length of the UUID primary key is too long, resulting in fewer primary keys stored in a single child node and more common start page splitting, which affects performance. This is why it is recommended not to have too long indexes.

5, overwrite index is a good optimization technique, can let the query directly through the index to return data, rather than back to the table, reduce IO, improve performance.

reference

1, https://dev.mysql.com/doc/refman/5.7/en/explain-output.html

2, https://dev.mysql.com/doc/refman/5.7/en/explain-output.html#explain-join-types

3, http://blog.codinglabs.org/articles/theory-of-mysql-index.html

Pay attention to the public number, enjoy the first time update.

Understand MySQL optimization recommendations in principle

An overview of the

Preliminary knowledge

B + tree index

Explain execution plan

Why is autoincrement primary key recommended

Why not use a SELECT * query

conclusion

reference

Related Posts

Program level flow limiting – necessity

Telegraf+Influxdb+Grafana build a monitoring platform

The Request header is too large.