The preparatory work

MySQL > create table MySQL > create table MySQL > create table MySQL > create table

CREATE TABLE `emp` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `empno` int(11) DEFAULT NULL COMMENT 'Employee Id',
  `ename` varchar(255) DEFAULT NULL COMMENT 'Name of Employee',
  `job` varchar(255) DEFAULT NULL COMMENT 'work',
  `mgr` varchar(255) DEFAULT NULL COMMENT 'Manager's Number',
  `hiredate` date DEFAULT NULL COMMENT 'Date of Employment',
  `sal` double DEFAULT NULL COMMENT 'wages',
  `comm` double DEFAULT NULL COMMENT 'allowance',
  `deptno` int(11) DEFAULT NULL COMMENT 'Department No.'.PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Table of Employees';

CREATE TABLE `dept` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'primary key',
  `deptno` int(11) DEFAULT NULL COMMENT 'Department Number',
  `dname` varchar(255) DEFAULT NULL COMMENT 'Department Name',
  `loc` varchar(255) DEFAULT NULL COMMENT 'address'.PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Department Table';

CREATE TABLE `salgrade` (
  `id` int(11) NOT NULL COMMENT 'primary key',
  `grade` varchar(255) DEFAULT NULL COMMENT 'class',
  `lowsal` varchar(255) DEFAULT NULL COMMENT 'Minimum wage',
  `hisal` varchar(255) DEFAULT NULL COMMENT 'Maximum salary'.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Pay scale';

CREATE TABLE `bonus` (
  `id` int(11) NOT NULL COMMENT 'primary key',
  `ename` varchar(255) DEFAULT NULL COMMENT 'Name of Employee',
  `job` varchar(255) DEFAULT NULL COMMENT 'work',
  `sal` double DEFAULT NULL COMMENT 'wages',
  `comm` double DEFAULT NULL COMMENT 'allowance'.PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Bonus schedule';
Copy the code

Follow-up execution plan, query optimization, index optimization and other knowledge of the drill, based on the above several tables to operate.

MySQL Execution Plan

To perform SQL tuning, you need to know how the SQL statement to be tuned is executed. You need to view the execution process of the SQL statement to speed up the execution efficiency of the SQL statement.

You can use explain + SQL statements to simulate the optimizer executing SQL queries to see how MySQL handles SQL statements.

Please check out the official website for explain.

Explain output format

mysql> explain select * from emp;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
Copy the code

Mysql > select_type; mysql > select_type;

Column Meaning
id The SELECTIdentifier (The SELECT identifier)
select_type The SELECTType (this SELECT type)
table The table for The output row
partitions The matching partitions
type The join type
possible_keys The possible indexes to Choose
key The index actually chosen
key_len The length of The chosen key
ref The columns are compared to The index.
rows Estimated of Rows to be examined
filtered Percentage of Rows filtered by table condition
extra Additional information

id

The serial number of a SELECT query, which contains a set of numbers indicating the order in which the select clause or operation table is executed in the query.

Id numbers fall into three categories:

  • If the IDS are the same, the order of execution goes from top to bottom
mysql> explain select * from emp e join dept d on e.deptno = d.deptno join salgrade sg on e.sal between sg.lowsal and sg.hisal; +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------- --------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len |  ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | e | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | NULL | 100.00 | | 1 SIMPLE | d | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | 100.00 Using the where; Using the join buffer (Block Nested Loop) | | | 1 SIMPLE | sg | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | 100.00 Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------- --------------------------------------+Copy the code

MySQL > select * from explain where id = 1; MySQL > select * from explain where id = 1;

  • If the IDS are different, the id sequence increases. A larger ID has a higher priority and is executed earlier
mysql> explain select * from emp e where e.deptno in (select d.deptno from dept d where d.dname = 'SALEDept');
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------- --------------------------------------------+
| id | select_type  | table       | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                              |
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------- --------------------------------------------+
|  1 | SIMPLE       | <subquery2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL |   100.00 | NULL                                               |
|  1 | SIMPLE       | e           | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where; Using join buffer (Block Nested Loop) |
|  2 | MATERIALIZED | d           | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where                                        |
+----+--------------+-------------+------------+------+---------------+------+---------+------+------+----------+-------- --------------------------------------------+
Copy the code

The execution sequence for this example is to execute id 2 first, then ID 1.

  • The same and different ids exist at the same time: The same ids can be regarded as a group and are executed from top to bottom. In all groups, the higher the ID value, the higher the priority, the earlier the execution

Again, execute id 2 first, and then execute id 1 from the top down.

select_type

It is mainly used to distinguish the type of query, whether it is common query, joint query or sub-query.

select_type Value JSON Name Meaning
SIMPLE None Simple SELECT (not using UNION or subqueries)
PRIMARY None Outermost SELECT
UNION None Second or later SELECT statement in a UNION
DEPENDENT UNION dependent (true) Second or later SELECT statement in a UNION, dependent on outer query
UNION RESULT union_result Result of a UNION.
SUBQUERY None First SELECT in subquery
DEPENDENT SUBQUERY dependent (true) First SELECT in subquery, dependent on outer query
DERIVED None Derived table
MATERIALIZED materialized_from_subquery Materialized subquery
UNCACHEABLE SUBQUERY cacheable (false) A subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query
UNCACHEABLE UNION cacheable (false) The second or later select in a UNION that belongs to an uncacheable subquery (see UNCACHEABLE SUBQUERY)
  • SIMPLEA simple query, without subqueries and unions
mysql> explain select * from emp;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
Copy the code
  • primaryIf the query contains any complex subqueries, the outermost query is marked as Primary
  • unionIf the second select appears after the union, it is marked as union
mysql> explain select * from emp where deptno = 1001 union select * from emp where sal < 5000;
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+--------- --------+
| id | select_type  | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra           |
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+--------- --------+
|  1 | PRIMARY      | emp        | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    25.00 | Using where     |
|  2 | UNION        | emp        | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    33.33 | Using where     |
| NULL | UNION RESULT | <union1,2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL |     NULL | Using temporary |
+----+--------------+------------+------------+------+---------------+------+---------+------+------+----------+--------- --------+
Copy the code

Select_type of this statement contains both primary and union

  • dependent unionSimilar to union, depentent here means that the result of a union or union all is affected by an external table
  • union resultSelect to get the result from the union table
  • dependent subquerySubquery subqueries are affected by external table queries
mysql> explain select * from emp e where e.empno  in ( select empno from emp where deptno = 1001 union select empno from emp where sal < 5000);
+----+--------------------+------------+------------+------+---------------+------+---------+------+------+----------+--- --------------+
| id | select_type        | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra           |
+----+--------------------+------------+------------+------+---------------+------+---------+------+------+----------+--- --------------+
|  1 | PRIMARY            | e          | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |   100.00 | Using where     |
|  2 | DEPENDENT SUBQUERY | emp        | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    25.00 | Using where     |
|  3 | DEPENDENT UNION    | emp        | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    25.00 | Using where     |
| NULL | UNION RESULT       | <union2,3> | NULL       | ALL  | NULL          | NULL | NULL    | NULL | NULL |     NULL | Using temporary |
+----+--------------------+------------+------------+------+---------------+------+---------+------+------+----------+--- --------------+
Copy the code

This SQL execution contains the PRIMARY, DEPENDENT SUBQUERY, DEPENDENT UNION, and UNION RESULT

  • subqueryInclude subqueries in select or WHERE lists

For example:

mysql> explain select * from emp where sal > (select avg(sal) from emp) ;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | PRIMARY     | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    33.33 | Using where |
|  2 | SUBQUERY    | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |   100.00 | NULL        |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
Copy the code
  • DERIVEDSubqueries that occur in the FROM clause are also called derived tables
  • MATERIALIZEDMaterialized subquery?
  • UNCACHEABLE SUBQUERYIndicates that results using subqueries cannot be cached

Such as:

mysql> explain select * from emp where empno = (select empno from emp where deptno=@@sort_buffer_size);
+----+----------------------+-------+------------+------+---------------+------+---------+------+------+----------+------ -------+
| id | select_type          | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+----------------------+-------+------------+------+---------------+------+---------+------+------+----------+------ -------+
|  1 | PRIMARY              | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |   100.00 | Using where |
|  2 | UNCACHEABLE SUBQUERY | emp   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    4 |    25.00 | Using where |
+----+----------------------+-------+------------+------+---------------+------+---------+------+------+----------+------ -------+
Copy the code
  • uncacheable unionThe result of the union query cannot be cached

table

Which table is being accessed by the row, table name or alias, possibly temporary table or union merge result set.

  1. If it is a concrete table name, it indicates that the data is fetched from the actual physical table, and of course it can be an alias for the table
  2. The table name is the form derivedN and represents a derived table that uses a query with ID N
  3. When there is a union result, the table name is in the form of union N1,n2, etc. N1 and n2 represent the IDS participating in the union

type

Type shows the access type. The access type indicates how I access our data. The most obvious one is full table scan, which is very inefficient to directly and violently traverse a table to find the data I need.

There are many types of access, in order of efficiency from best to worst:

system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL

In general, make sure the query is at least range and preferably ref

  • allFull table scan, in general, such SQL statements and large data volume then need to be optimized

In general, you can avoid ALL by adding indexes

  • indexFull index scanning is more efficient than all in two main cases:
    • One is that the current query overwrites the index, that is, the data we need is available in the index
    • One is to use indexes for sorting, which avoids reordering of data
  • range=, <>, >, >=, <, <=, IS NULL, BETWEEN, LIKE, or IN();

Examples on the official website are as follows:

SELECT * FROM tbl_name WHERE key_column = 10;

SELECT * FROM tbl_name WHERE key_column BETWEEN 10 and 20;

SELECT * FROM tbl_name WHERE key_column IN (10,20,30);

SELECT * FROM tbl_name WHERE key_part1 = 10 AND key_part2 IN (10,20,30);

  • index_subqueryUse indexes to associate subqueries instead of scanning the full table

value IN (SELECT key_column FROM single_table WHERE some_expr)

  • unique_subqueryThis join type is similar to index_SubQuery in that it uses a unique index

value IN (SELECT primary_key FROM single_table WHERE some_expr)

  • index_mergeMultiple indexes are required to be used in combination during the query process
  • ref_or_nullIn cases where a field requires both an associated condition and a NULL value, the query optimizer will choose this access method

SELECT * FROM ref_table

WHERE key_column=expr OR key_column IS NULL;

  • fulltextPerform a join using the FULLTEXT index
  • refNon-unique indexes are used for data lookup

SELECT * FROM ref_table WHERE key_column=expr;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;

  • eq_refUse unique indexes for data lookup

SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;

SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1;

  • constThis table has at most one matching row

SELECT * FROM tbl_name WHERE primary_key=1;

SELECT * FROM tbl_name WHERE primary_key_part1=1 AND primary_key_part2=2;

Such as:

mysql> explain select * from emp where id = 1;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | emp   | NULL       | const | PRIMARY       | PRIMARY | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
Copy the code
  • systemThe table has only one row of records (equal to the system table), which is a special case of const type and does not normally occur

possible_keys

Shows one or more indexes that may be applied to the table. If an index exists on a field involved in the query, the index will be listed but not necessarily used by the query

key

If the index in use is null, no index is used. If an overwrite index is used, the index overlaps with the select field of the query

key_len

Represents the number of bytes used in the index. Key_len can be used to calculate the length of the index used in the query. The shorter the length is, the better, without losing precision

ref

Shows which column of the index is used, if possible, as a constant

rows

Based on table statistics and index usage, it is important to estimate the number of rows that need to be read to find the required record. This parameter is a direct reflection of how much data is found in the SQL, and as little as possible to accomplish the purpose

extra

Contains additional information

  • using filesortNote mysql cannot use indexes to sort data, but can only use the sorting algorithm to sort data, which consumes extra space
  • using temporaryCreate a temporary table to hold intermediate results and delete the temporary table after the query is complete
  • using indexThis indicates that the current query overrides the index, reading data directly from the index without accessing the table. Using WHERE indicates that the index is being used to perform a lookup of the index’s key value. If not, it indicates that the index is being used to read data rather than actually lookup it
  • using whereUse WHERE for conditional filtering
  • using join bufferUsing connection caching
  • impossible whereThe result of the WHERE statement is always false

MySQL index basics

To understand how indexes are optimized, you must understand the underlying principles of indexes.

Advantages of indexes

  1. This greatly reduces the amount of data that the server needs to scan
  2. Helps the server avoid sorting and temporary tables
  3. Changed random IO to sequential IO (improved efficiency)

Usefulness of indexes

  1. Quickly find rows that match the WHERE clause
  2. Remove rows from consideration. If you can choose between multiple indexes, mysql typically uses indexes that find the least rows
  3. If the table has multiple column indexes, the optimizer can use any left-most prefix of the index to find rows
  4. Retrieves row data from other tables when there is a table join
  5. Find min or Max values for a particular index column
  6. Sort and group tables if sorting or grouping is done on the leftmost prefix of the available index
  7. In some cases, queries can be optimized to retrieve values without querying rows of data

Classification of indexes

MySQL index data structure inference

Indexes are used to quickly find rows with specific column values.

If there is no index, MySQL must start with the first row and then read through the table to find relevant rows.

The larger the table, the more time it takes. If the table has indexes for related columns, MySQL can quickly determine where to look in the middle of the data file without having to look at all the data. This is much faster than reading each line sequentially.

If MySQL indexes can help us query data quickly, how does it store data underneath?

Several possible storage structures

hash

Index format of the hash table

Disadvantages of storing data in hash tables:

  1. To use hash storage, all data files need to be added to the memory, which consumes memory space
  2. If all queries are equivalent queries, then hash is really fast, but in a real work environment where there are more range look-up data than equivalent queries, hash is not a good fit

In fact, when the MySQL storage engine is memory, the index data structure uses hash tables.

Binary tree

The structure of a binary tree looks like this:

Binary trees cause data skew due to the tree depth. If the tree depth is too deep, THE I/O count increases, affecting the data read efficiency.

AVL trees need to be rotated, see the legend:

In addition to the rotation operation, red-black trees also have a color change function (to reduce rotation), so that although the insertion speed is fast, but the loss of query efficiency.

In binary trees, AVL trees, and red-black trees, the I/O count increases because the tree depth is too deep, affecting the data read efficiency.

Let’s look at the B tree

B tree features:

  • All the keys are distributed throughout the tree
  • It is possible to end the search at non-leaf nodes and make a search in the whole set of keywords. The performance is close to binary search
  • Each node has at most m subtrees
  • The root node has at least two subtrees
  • Branch nodes have at least M /2 subtrees (all branch nodes except root and leaf nodes)
  • All leaf nodes are in the same layer, each node can have up to M-1 keys, and are arranged in ascending order

Legend description:

Each node occupies a disk block, and a node has two ascending keywords and three Pointers to the child root node. The Pointers store the address of the disk block where the child node resides.

The three scope fields divided into two keywords correspond to the scope fields of the data of the subtree pointed to by the three Pointers.

For the root node, the keywords are 16 and 34. The data range of the subtree to which the P1 pointer points is less than 16, the data range of the subtree to which the P2 pointer points is 16 to 34, and the data range of the subtree to which the P3 pointer points is greater than 34.

Procedure for finding keywords:

1. Locate disk block 1 based on the root node and read it into the memory. [Disk I/O operation 1]

2. Find pointer P2 to disk block 1 when keyword 28 is in the interval (16,34).

3. Locate disk block 3 according to the P2 pointer and read it into the memory. [Disk I/O operation 2nd]

4. Find pointer P2 to disk block 3 by comparing keyword 28 in interval (25,31).

5. Locate disk block 8 according to P2 pointer and read it into the memory. [Disk I/O operation 3]

6. In the keyword list of disk block 8, locate keyword 28.

Therefore, we can know the disadvantages of B-tree storage:

  • Each node has a key and also contains data. The storage space of each page is limited. If data is large, the number of keys stored in each node will be smaller
  • When a large amount of data is stored, the depth is large, and the DISK I/O count increases, affecting the query performance

So what is the MySQL index data structure

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees

Don’t get me wrong, in fact, MySQL index storage structure is B+ tree, above we know that B tree is not appropriate.

Mysql index data structure –B+Tree

B+Tree is an optimization based on BTree. The changes are as follows:

1. Each node of a B+Tree can contain more nodes for two reasons. The first reason is to reduce the height of the Tree, and the second reason is to change the data range into multiple ranges.

2. Non-leaf nodes store keys, while leaf nodes store keys and data.

3. Two Pointers on leaf nodes are connected to each other (in line with the disk prefetch feature), providing higher sequential query performance.

B+ tree storage search schematic diagram:

Note:

There are two head Pointers on the B+Tree, one to the root node and the other to the leaf node with the smallest keyword, and there is a chain-ring structure between all the leaf nodes (that is, the data nodes).

Therefore, there are two kinds of lookup operations on B+Tree: a range and paging lookup on the primary key, and a random lookup starting from the root node.

Since the leaf nodes of B+ tree only store data and the root node only store keys, let’s calculate that even if there is only 3 layers of B+ tree, it can also produce tens of millions of levels of data.

Zhuang you need to know

Consider the following table where id is the primary key:

mysql> select * from stu;
+------+---------+------+
| id   | name    | age  |
+------+---------+------+
|    1 | Jack Ma |   18 |
|    2 | Pony    |   19 |
+------+---------+------+
Copy the code

Back to the table

We create a normal index for a normal column.

select * from stu where name='Pony';
Copy the code

Select * from B+ tree where primary key ID is found; select * from B+ tree where primary key ID is found;

This will eventually go back to the primary key and look up the B+ tree, which is the back table.

Cover index

For this query:

mysql> select id from stu where name='Pony';
Copy the code

There is no need to find the primary key id.

No return table is called an overwrite index.

The left matching

Create a composite index (name, age) with the name and age fields, and then query:

select * from stu where name=? and age=?
Copy the code

Select * from index (name, age); select * from index (age);

select * from stu where age=?
Copy the code

If the index is not queried by name, the index will not be queried by name – this is the left-most matching rule.

Join I have to search by age, but also have an index to optimize? It can be done like this:

  • (Recommended) Change the order of the combined index (name, age) and create the index (age, name)
  • Or you can just putageCreate a separate index for the field

An index pushdown

Maybe also called predicate push down…

select t1.name,t2.name from t1 join t2 on t1.id=t2.id
Copy the code

T1 has 10 records, t2 has 20 records.

Let’s guess, this is either executed this way:

Select id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id, id

Or:

First find out t1.name,t2.name, and then associate according to id

If you do not use the index condition push optimization, MySQL can only query the index t1, T2 merged all rows, and then compare all conditions.

When the index condition push-down optimization technology is used, the data stored in the index can be used to determine whether the data corresponding to the current index meets the conditions. Only the data matching the conditions can be queried.

summary

  1. ExplainTo optimize the execution of SQL statements, you need to view the execution process of SQL statements to speed up the execution efficiency.
  2. Index advantages and usefulness.
  3. The data structure used for the index is a B+ tree.
  4. Back table, overwrite index, left-most match and index push down.

The first public line 100 li ER, welcome the old iron attention reading correction. GitHub github.com/xblzer/Java…