Understand the explain keyword of MySQL thoroughly

This is the third day of my participation in the Gwen Challenge

Database is a programmer must have a basic skill, basic every interview will ask. For programmers just out of school, you just need to learn how to use it, but the more you work, the more you realize that just writing SQL statements is not enough. Write SQL, if the performance is not good, not up to the requirements, may block the whole system, that is fatal to the whole system.

So how can you tell if your SQL is written well? After all, only if you know how to write SQL well, can you consider how to optimize the problem.

MySQL official to provide us with a lot of SQL analysis tools, here we mainly EXPLAIN.

The following analysis is based on MySQL5.7.28, with slight variations from version to version.

1.1 concept

Use the EXPLAIN keyword to simulate the optimizer’s execution of SQL statements to see how MySQL is processing your statements and analyze performance bottlenecks in your query statements or table structures.

Usage: EXPLAIN+ SQL statement

EXPLAIN The following information is returned:

The general meanings of each field are as follows:

Id: indicates the id of the SELECT query. Each SELECT is automatically assigned a unique identifier.
Select_type: SELECT query type.
Table: Indicates the table to be queried
Partitions: matches partitions
Type: the join type
Possible_keys: possible index in this query
Key: indicates the exact index used in the query.
Key_len: The number of bytes used by the query optimizer for the index.
Ref: Which field or constant is used with the key
Rows: Shows how many rows were scanned for this query. This is an estimate.
Filtered: indicates the percentage of data filtered by this query condition
-Blair: Extra information

1.2 Preparations

Create a new database, test, and execute the following SQL statement

CREATE TABLE t1(id INT(10) AUTO_INCREMENT,content VARCHAR(100) NULL , PRIMARY KEY (id));
CREATE TABLE t2(id INT(10) AUTO_INCREMENT,content VARCHAR(100) NULL , PRIMARY KEY (id));
CREATE TABLE t3(id INT(10) AUTO_INCREMENT,content VARCHAR(100) NULL , PRIMARY KEY (id));
CREATE TABLE t4(id INT(10) AUTO_INCREMENT,content VARCHAR(100) NULL , PRIMARY KEY (id));
INSERT INTO t1(content) VALUES(CONCAT('t1_'.FLOOR(1+RAND()*1000)));
INSERT INTO t2(content) VALUES(CONCAT('t2_'.FLOOR(1+RAND()*1000)));
INSERT INTO t3(content) VALUES(CONCAT('t3_'.FLOOR(1+RAND()*1000)));
INSERT INTO t4(content) VALUES(CONCAT('t4_'.FLOOR(1+RAND()*1000)));
Copy the code

The meaning of each column is explained below.

1.3 id

The serial number of a SELECT query that contains a set of numbers indicating the order in which the SELECT clause is executed or the order in which the tables are operated in the query. It can be roughly divided into the following situations

(1) The ids are the same, and the execution sequence is from top to bottom

The above query statements, all with id 1, have the same priority and are executed from top to bottom. The specific execution order is determined by the optimizer. In this case, the execution order is T1, T2, t3.

(2) Different ids, the higher the number, the higher the priority

If there is a subquery in SQL, then the id is incremented, and the larger the ID is, the sooner the query is executed. As shown in the figure above, the order of execution is T3, T1, and T2, meaning that the innermost subquery is executed first and from the inside out.

When I was testing, I came across completely different statements using the IN keyword and the = operator when executed using EXPLAIN.

This means that each time the outermost subquery is executed, the inner subquery will be repeated. This is a bit different from my understanding.

Specific can look at this article, I think about what is understood. Segmentfault.com/a/119000000… . I won’t go into further details here.

Never use IN, JOIN, or EXISTS instead

(3) There are the same and different ids

Add an IN subquery to the above statement, and the result is as follows

The execution sequence is T3, T1, T2, and T4. The larger the value, the earlier the execution, the same value from the top down the execution.

1.4 select_type

Select_type specifies the query type. It is mainly used to distinguish the common query, sub-query, and joint query. It is divided into the following types:

(1) SIMPLE

A simple SELECT query that does not contain subqueries or unions.

(2) the PRIMARY

If the query contains any complex subqueries, the outermost query is marked as PRIMARY.

(3) is DERIVED

Subqueries contained in the FROM clause are labeled DERIVED, and MySQL executes these subqueries recursively, putting the results in temporary tables.

(4) the SUBQUERY

A SUBQUERY is included in the SELECT or WHERE clause and is marked as a SUBQUERY.

(5) the UNION

If the second SELECT query appears after the UNION, it is marked as UNION. If UNION is included in the subquery of the FROM clause, the outer SELECT will be marked: DERIVED

(6) UNION RESULT

SELECT to get the result from the UNION table.

The first three above have already appeared in the previous section, look at the next three

You can see that a NULL appears in the ID column, which is not mentioned above. In general, the value can be NULL in special cases where one line of statements references the union of other multi-line result sets.

Table 1.5

This is not a big deal, indicating which table the query is based on. It doesn’t have to be a real table. DERIVED and <union1,2>, as shown above, generally give the following values:

(1)

: the union of the result set of the line numbered A and the line numbered B in the output result.

(2) < derived A > : The result set of the row numbered A in the output result. Derived means that it is a derived result set, as in the query in the FROM clause.

(3) < subquery A > : the result set of the row numbered A in the output result. Subquery indicates that this is a materialized subquery.

1.6 partitions

For non-partitioned tables, the value is NULL. If partitioned tables are queried, partitions that are matched by partitioned tables are displayed.

According to the official document, when creating a table, specify different partitions to store different IDS.

Insert the test data so that the ID values are distributed across the four partitions.

Execute the query result.

1.7 type

Type is the access type of the query, which is an important indicator. System > const > eq_ref > ref > fulltext > ref_or_NULL > index_merge > Unique_subquery > index_subquery > Range > index > ALL.

In general, make sure the query is at least range, preferably ref.

(1) the system

When there is only one row in the table (system table), the amount of data is very small and the speed is very fast, which is a very special case and not common.

(2) the const

When your query criteria are a primary key or a unique UNION INDEX and the value is constant, the query is very fast because you only need to read the table once.

Add a unique index to the CONTENT column of t1 table

(3) the eq_ref

Next to system and const, the best performance is eq_ref. Unique index scan. For each index key, only one record in the table matches it. Common for primary key or unique index scans.

(4) the ref

A non-unique index scan that returns all rows matching a single value. Unlike eq_ref, which uses an index other than the PRIMARY KEY and UNIQUE index, ref may have multiple query results. You can use the = operator or the <=> operator.

Add the normal index to the CONTENT column of the T2 table

The query

(5) the fulltext

Use fulltext indexes when querying.

(6) ref_or_null

For a field that requires both an associated condition and a NULL value. The query optimizer will choose to query with ref_OR_NULL join.

(7) index_merge

Multiple indexes are required to be used in combination during a query, usually in SQL with the OR keyword.

Unique_subquery (8)

This join type is similar to index_SubQuery. A unique index in a subquery. In some in subqueries, this is used to replace eq_ref, as in the following query statement

value IN (SELECT primary_key FROM single_table WHERE some_expr)
Copy the code

(9) index_subquery

[pg-e55Cc-1621069077487-0] use index to associate subquery, not full table scan. For non-unique indexes, subqueries can return duplicate values. Similar to unique_subquery, but used for non-unique indexes

value IN (SELECT key_column FROM single_table WHERE some_expr)
Copy the code

(10) range

Only rows of a given range are retrieved, using an index to select rows. The key column displays which index is being used, typically when you have a BETWEEN, <, >, in, etc query in your WHERE statement. This type of range index scan is better than a full table scan because it only needs to start at one point in the index and end at another point without scanning all the indexes.

Select * from t3 where id is the PRIMARY key and content is not indexed. Select * from T3 where ID is the PRIMARY key

Using Content as the condition, the result is as follows

Therefore, range searches are performed only on fields that are indexedtypeIs therange.

(11) index

SQL statements use indexes, but are not filtered by the index. They are usually overridden or sorted by the index.

Both index and ALL are read from the entire table. The difference is that index is read from the index tree, while ALL is read from the hard disk. Index is usually faster than ALL because index files are usually smaller than data files.

Select primary key id from t3

ALL (12)

Full table scan has the worst performance.

1.8 possible_keys

One or more indexes that may be used when querying. If an index exists on a field involved in the query, the index will be listed. Note that it is possible, but may not be used in actual queries.

1.9 the key

The actual index used in the query is NULL if no index is used. If an overwrite index is used in the query, the index appears only in the key field.

For example, if trB1 has a combined index (age, name), then if trB1 has a combined index (age, name), then if trB1 has a combined index (age, name), then if trB1 has a combined index (age, name), then the result is as follows:

1.10 key_len

Represents the number of bytes used in the index, which is used to calculate the length of the index used in the query. With no loss of accuracy, the shorter the length, the better.

The value displayed for key_len is the maximum possible length of the index field, not the actual length used, i.e. key_len is calculated from the table definition, not from an in-table search.

The key_len field helps you check to see if you are making full use of the index. The longer ken_len is, the more fully used the index is.

Note that key_len only calculates the length of the index used in the WHERE condition, and sorting and grouping does not count to key_len even if indexes are used.

For example, there is table TRB1, which has the following fields, and a combined index idx_AGe_name

The following query results of the statement

The value of key_len is 153, 158, and null. How to calculate:

(1) Look at the type + length of the field on the index. For example, int = 4; varchar(50) = 50 ; Char (50) = 50.

(2) If a string field is vARCHar or char, the character set must be multiplied by a different value. For example, utF-8 must be multiplied by 3 and GBK must be multiplied by 2.

③ The dynamic string vARCHar is added by 2 bytes.

④ Add 1 byte to fields that are allowed to be empty.

The length of key_len = name is 50 * 3 + 2 + 1 = 153

Key_len = age + name = 4 +1 + (50*3 + 2 +1)= 5 + 153 = 158 (Use more full index, more accurate query results, but more consumption)

Number three: The index is broken.

1.11 ref

Display which column of the index is used. Common values are const, func, null, and field name.

Display when using constant equivalence queryconst.
When associated query, the corresponding associated table will be displayedAssociated fields
If the query condition is usedexpression,function, or the condition column has been implicitly converted internally and may be displayed asfunc
Other situationsnull

For example, if the content field of table T3 has a normal index, the following query statement results in the following

1.12 rows

The rows column represents the number of rows that MySQL thinks it might need to read to perform a query, and generally the smaller the value, the better!

1.13 filtered

Filtered is a percentage that represents the percentage of records that are eligible. In simple terms, this field represents the percentage of records returned by the storage engine that satisfy the criteria after the data is filtered.

Prior to mysql.5.7, to show filtered used the Explain extended command. With MySQL.5.7, default Explain directly displays filtered data for partitions and filtered data.

1.14 Extra

Additional information.

(1) Using filesort

Note mysql uses an external index sort for data, rather than reading data in the order of the indexes in the table. A sort operation in MySQL that cannot be done with an index is called “file sort”.

For example, the TRB1 table creates a composite index

Filesort appears in the following query:

According to the order of composite indexes, it is name, age, and purchased. In the above query statement, the middle age is not used, so the index is invalid during the order by. This is often the case for optimization.

Modify the above SQL statement so that the index is not invalidated.

(2)

Instead of holding intermediate results in temporary tables,MySQL uses temporary tables when sorting query results. Common in sort order by and group by queries.

This SQL statement uses temporary tables and file sorting, which is inefficient when the amount of data is very large and needs to be optimized.

So when using group BY and order BY, the number and order of columns should be the same as those of the index.

(3) Using index

Using index indicates that a corresponding select operation uses a Covering index, avoiding access to the data row of the table, which can improve efficiency.

Using WHERE indicates that the index is used to perform a lookup of the index key value. If using WHERE is not present at the same time, it indicates that the index is used to read data rather than perform a lookup using the index.

Again, use the trB1 table above for example

Only Using Index appears, indicating that the index is used to read data rather than perform a lookup.

The presence of Using WHERE indicates that the index is used to perform the lookup.

(4) Using where

Indicates that the index is used for WHERE filtering when querying.

(5) Using join buffer

The connection cache was used for the query.

(6) impossible where

The query statement’s WHERE condition is always false, for example

It doesn’t normally happen.

There are many values for “Extra”, which are not listed here. For details, please refer to the official documents.

Resources: dev.mysql.com/doc/refman/…