Mysql index interview no longer afraid

preface

Article content output source: Pull hook education Java high salary training camp;

Mysql index we must ask in the interview, just I in the pull pull training camp to learn the relevant knowledge of mysql index, here sorted down, I have a comprehensive understanding of mysql index, interview time is no longer afraid.

The index type

Indexes can speed up queries and affect where queries and order by sorting. MySQL index type:

Index storage structure: B Tree index, Hash index, FULLTEXT full-text index, R Tree index

Application hierarchy: common index, unique index, primary key index, composite index

By index key value type: primary key index, secondary index (secondary index)

Partitioning from data storage and index key-value logic: clustered index (clustered index), non-clustered index (non-clustered index)

The preparatory work

Let’s first create a table, and then create the index in this table.

DROP TABLE IF EXISTS `r_resume`; CREATE TABLE 'r_resume' (' id 'int(11) NOT NULL AUTO_INCREMENT,' sex 'varchar(10) DEFAULT NULL COMMENT' gender ', 'birthday' varchar(30) DEFAULT NULL COMMENT 'date of birth ',' work_year 'varchar(100) DEFAULT NULL COMMENT' date of birth ', 'phone' varchar(20) DEFAULT NULL COMMENT 'mobile phone ',' email 'varchar(100) DEFAULT NULL COMMENT' email ', 'status' varchar(80) DEFAULT NULL COMMENT' status ', 'resumeName' varchar(500) DEFAULT NULL COMMENT 'resumeName ', 'name' varchar(40) DEFAULT NULL, 'createTime' datetime DEFAULT NULL COMMENT 'createTime ', 'headPic' varchar(100) DEFAULT NULL COMMENT 'headPic' varchar(100) DEFAULT NULL COMMENT 'headPic' varchar(100) DEFAULT NULL COMMENT 'isDel' int(2) DEFAULT NULL COMMENT 'headPic' varchar(100) DEFAULT NULL COMMENT 'headPic' varchar(100) DEFAULT NULL COMMENT 'isDel' int(2) DEFAULT NULL COMMENT ' 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' DEFAULT NULL COMMENT 'updateTime' 'isDefault' int(2) DEFAULT NULL COMMENT 'not DEFAULT ',' highestEducation 'varchar(20) DEFAULT' COMMENT ', 'deliverNearByConfirm' int(2) DEFAULT '0' COMMENT 'deliverNearByConfirm 0- Required 1- Not required ', 'refuseCount' int(11) NOT NULL DEFAULT '0' COMMENT '表 示 ', 'markCanInterviewCount' int(11) NOT NULL DEFAULT '0' COMMENT '评 论 ', 'haveNoticeInterCount' int(11) NOT NULL DEFAULT '0' COMMENT ' 'oneWord' varchar(100) DEFAULT 'COMMENT ',' liveCity 'varchar(100) DEFAULT' COMMENT ', 'resumeScore' int(3) DEFAULT NULL COMMENT 'checksum ',' userIdentity 'int(1) DEFAULT '0' COMMENT' userIdentity ', 'isOpenResume' int(1) DEFAULT '3' COMMENT ' InnoDB (' id ') ENGINE=InnoDB AUTO_INCREMENT= utf8;Copy the code

Normal index

This is the most basic type of index, based on a normal field index, there are no restrictions.

Create a normal index as follows:

Create table name; create table name; create index index_work_year on r_resume(work_year); ALTER TABLE table_name add INDEX; ALTER TABLE r_resume add INDEX index_sex(sex); # drop index drop index on table name # drop index drop index on table nameCopy the code

The only index

Different from normal indexes, index field values must be unique, but empty values are allowed. When creating or modifying a table, the unique constraint is appended, and the corresponding unique index is automatically created.

Create a unique index as follows:

Select * from table where id = 1; select * from table where id = 1; CREATE UNIQUE INDEX index_userid on r_resume(userId) ## ALTER TABLE r_resume add UNIQUE INDEX index_userid(userId);Copy the code

The primary key index

It is a special unique index that does not allow empty values. When creating or modifying a table, you can append the primary key constraint. Each table can have only one primary key.

Create a primary key index as follows:

Alter table table_name add primary KEYCopy the code

The composite index

A single index is a column of an index. That is, the statement of a new index is executed on only one column. Users can create indexes on multiple columns called group composite indexes (composite indexes). Composite indexes can replace multiple single indexes and require less overhead than multiple single indexes. There are two concepts of index at the same time called narrow index and wide index. Narrow index refers to the index with 1-2 columns, and wide index refers to the index with more than 2 columns. An important principle of index design is to use narrow index instead of wide index, because narrow index is often more effective than combined index

Create a composite index as follows:

Select * from table_name where table_name = '1' and table_name = '2'; Create index index_work_year_sex on r_resume(work_year,sex) ## ALTER TABLE table_name add index; ALTER TABLE r_resume add INDEX index_work_year_sex(work_year,sex);Copy the code

Precautions for using composite indexes:

When to use compound indexes, build indexes according to WHERE conditions. Do not use too many indexes. Excessive use of indexes greatly affects the efficiency of update operations.

If the table is already created (col1, col2), there is no need to create it separately (col1); If there is a (COL1) index, if the query needs col1 and COL2 conditions, can create (col1,col2) composite index, for the query has a certain improvement.

The full text indexing

In the query operation, like fuzzy query can be used when the amount of data is relatively small, but for a large amount of text data retrieval, the efficiency is very low. If full-text indexes are used, the query speed is many times faster than that of like. Prior to MySQL 5.6, only MyISAM storage engine supported full-text indexing. Since MySQL 5.6, both MyISAM and InnoDB storage engine have supported full-text indexing.

Create a full-text index as follows:

Mysql > create FULLTEXT index on table name; Create fulltext index index_status on r_resume(' status') ## ALTER TABLE table_name add fulltext index; ALTER TABLE r_resume add fulltext INDEX index_status(`status`);Copy the code

Unlike the usual like fuzzy queries, full-text indexes have their own syntax format, using the match and Against keywords, for example

Select * from r_resume WHERE MATCH(' status ') AGAINST(' status ')Copy the code

Notes for using full-text indexes:

show variables like '%ft%';
Copy the code

Full-text indexes must be built on string, text fields.
The full-text index field value must be between the minimum and maximum characters to be valid. (InnoDB: 3-84; Myisam: 4-84)
Full-text index field values are to be sliced according to syntax characters, such as B + AAA, divided into B and AAA
Full-text index matching query, default use is equivalent matching, for example, A matches a, will not match AB, AC. If you want to match, you can search for a* in Boolean mode

The index principle

An index is a data structure used by the storage engine to quickly find records. Extra space and data maintenance work is required.

Indexes are physical data page storage, stored in data files (InnoDB, IBD files) using data pages (pages).

Indexes can speed up the search speed, but at the same time reduce the speed of add, delete and modify operations, index maintenance costs.

Theoretical knowledge of indexing: binary lookup, Hash, and B+Tree.

B tree structure

Index values and data are distributed throughout the tree structure
Each node can store multiple index values and corresponding data data
Multiple index values in a tree node are arranged in ascending order from left to right

B tree search: starting from the root node, the index value sequence in the node using dichotomy search, if hit the end of the search. No hit will enter the child node to repeat the search process, until the corresponding node pointer is empty, or is already a leaf node.

B + tree structure

Non-leaf nodes do not store data, but only index values, which is convenient for storing more index values
The leaf node contains all the indexes and data
Leaf nodes are connected with Pointers to improve the access performance of the interval

Compared with B tree, when B+ tree conducts range search, it only needs to search the index values of two nodes and then traverse by using the Pointers of leaf nodes. However, B Tree requires traversal of all nodes and data within the range. Obviously, B+Tree has high efficiency.

Clustered index

Clustered index and non-clustered index: leaf nodes of B+Tree that store primary key index values and row records belong to clustered index. If the index value is stored separately from the row record, it is a non-clustered index.

In The InnoDB engine, primary key indexes are stored in a clustered index structure.

A clustered index is a data storage method. InnoDB’s clustered index constructs a B+Tree structure according to the primary key order. The leaf nodes of a B+Tree are row records, which are tightly stored with primary key values. This also means that the primary key index of InnoDB is the table itself. It stores the data of the entire table in primary key order. The space occupied by InnoDB is the size of the data volume of the entire table. A primary key index is a clustered index.

Index analysis and optimization

Explain

MySQL provides an EXPLAIN command that analyzes SELECT statements and outputs details of SELECT execution for developers to optimize. Such as:

EXPLAIN select * from r_resume WHERE MATCH(' status ') AGAINST(' I am currently employed '); EXPLAIN select * from R_resume where MATCH(' status ') AGAINST(' I am currently employed '); explain select * from r_resume where id> 2195333;Copy the code

select_type

Indicates the type of the query. Common values are as follows:
- SIMPLE: indicates that the query statement does not contain subquery or union
- PRIMARY: indicates that this query is the outermost query
- UNION: Indicates that this query is the second or subsequent query to the UNION
- DEPENDENT UNION: The second or subsequent query statement in the UNION that uses external query results
- UNION RESULT: the RESULT of the UNION
- SUBQUERY: SELECT the SUBQUERY statement
- DEPENDENT SUBQUERY: The SELECT SUBQUERY depends on the results of the outer query.

explain select * from r_resume where id> 2195333 union select * from r_resume where id=1; Explain select * from r_resume where id in(select id from r_resume where sex=' male '); explain select * from r_resume where id in(select ID from r_resume where sex=' male ');Copy the code

type

Indicates the method used by the storage engine to query data. An important property to determine whether a query is a full table scan or a partial index-based scan. The common attribute values are as follows, and the efficiency increases from top to bottom.
- ALL: indicates full table scan, with the lowest performance.
- Index: indicates a full table scan based on an index. The index is scanned first and then all table data.
- Range: indicates the range of indexes to be queried. Use >, >=, <, <=, in, and so on.
- Ref: indicates that a non-unique index is used for single-value query.
- Eq_ref: usually occurs in multi-table JOIN query, indicating that each record of the previous table matches only one row of the subsequent table.
- Const: use primary key or unique index for equivalent query, constant query.
- NULL: indicates that the table is not accessed and the fastest.

explain select * from r_resume where id= 2195333 union select * from r_resume where id=1; Explain select * from r_resume where id in(select id from r_resume where sex=' male '); explain select * from r_resume where id in(select ID from r_resume where sex=' male '); Explain select * from r_resume where sex=' male '; explain select * from r_resume where id> 2195333; explain select * from r_resume where id= 2195333;Copy the code

Possible_keys: indicates the possible_keys index that can be used in query. Note that it is not always used; the index name is displayed.
Key: indicates the actual index used in the query. The index name is displayed.
Rows: The MySQL query optimizer estimates, based on the statistics, how many rows the SQL needs to scan to get the results. In principle, the less rows, the higher the efficiency, you can intuitively understand the efficiency of SQL.
Key_len: indicates the number of bytes used by the index in the query. You can determine whether all composite indexes are used.

Key_len is computed as follows:
- String type
  
  The length of a string depends on the character set: latin1=1, GBK =2, UTF8 =3, UTF8MB4 =4
  
  Char (n) : n* The length of the character set
  
  Varchar (n) : n x Character set length + 2 bytes
- Numeric types
  
  TINYINT: 1 byte
  
  SMALLINT: 2 bytes
  
  MEDIUMINT: 3 bytes
  
  INT, FLOAT: 4 bytes
  
  BIGINT, DOUBLE: 8 bytes
- Time to type
  
  DATE: 3 bytes
  
  TIMESTAMP: 4 bytes
  
  DATETIME: 8 bytes
- The field properties
  
  The NULL attribute takes 1 byte. If a field is NOT NULL, this item is NOT available.
Extra: indicates a lot of Extra information. Relevant information will be displayed in Extra for various operations. Common ones are as follows:
- Using WHERE: Indicates that data needs to be queried back to the table through the index
- Using index: indicates that the query needs to ventilate the index to satisfy the required data
- Using fifilesort: The results of the query need to be sorted. The amount of data is small in memory and large in disk. Therefore, Using fifilesort is recommended for optimization.
- Using temprorary: Queries that use temporary tables. This occurs when operations such as deduplication and grouping are performed.

Back to the table query

As mentioned earlier, InnoDB indexes have clustered indexes and secondary indexes. The leaf node of the clustered index stores row records. InnoDB must have only one. Leaf nodes of secondary indexes store primary key values and index field values. Row records cannot be directly located through secondary indexes. Generally, it is necessary to scan the index tree twice. Locating the primary key value through the secondary index and then the row record through the clustered index is called a table-back query, which has lower performance than scanning the index tree.

Summary: Query the primary key through the index, and then query the record information through the cluster index

Cover index

If the Extra field of the output result of EXPLAIN is Using index, index overwriting can be triggered. You can obtain all columns of SQL data in an index tree without returning to the table, which is called index coverage.

The most common way to achieve index coverage is to create the queried fields into a composite index.

Left-most prefix rule

When a composite index is used, the left-most prefix is the left-most prefix. That is, if the left-most column is used in the query, the index will be used. If the search starts from the second column of the index, the index will be invalid.

LIKE the query

MySQL > select * from ‘like’;

The index can be used only if the % character is appended to it.

Explain select * from r_resume WHERE 'status' like '% '; explain select * from r_resume where' status' like '% '; Explain select * from r_resume WHERE 'status' like' %'; explain select * from r_resume where 'status' like' %'; Explain select * from r_resume WHERE 'status' like' %'; explain select * from r_resume where 'status' like' %'; (Not used)Copy the code

NULL query

If a column of the MySQL table contains a NULL value, is the index containing the column valid?

Effective.

explain select * from r_resume where email is null;
Copy the code

For MySQL, NULL is a special value. Conceptually, NULL means “an unknown value” and is handled a little differently than other values. For example, the =, <, > operators cannot be used, the result of the arithmetic operation on NULL is NULL, the count does not include NULL lines, etc. NULL requires more storage space than an empty string, etc.

MySQL can use indexes on columns that contain NULL, but NULL is different from other data, and it is not recommended to allow NULL on columns. It is best to set NOT NULL and give a default value, such as 0 and an empty string. If datetime is used, you can also set the current system time or a fixed special value, such as ‘1970-01-01 00:00:00’.

Query optimization

Slow query location

1. Enable slow query

Run the following command to check whether the slow query of logs and the storage location of the slow query log file are enabled in the MySQL database:

show variables like '%slow_query_log%';
Copy the code

Run the following command to enable slow log query:

SET global slow_query_log = ON; // SET global slow_query_log_file = 'oak-slow.log '; SET global log_QUERies_NOT_usING_indexes = ON; // Change the location of slow query logs. SET long_query_time = 10; // The query time exceeds 10 secondsCopy the code

2. Slow log query

Find the corresponding file directly and view it in Notepad.

Time: indicates the time when logs are recorded

User@Host: indicates the user and host to be executed

Query_time: execution time

Lock_time: indicates the time when a table is locked

Rows_sent: Number of records sent to the requester

Rows_examined: Indicates the number of scanned records

SET timestamp: The time at which the statement was executed

select…. : Indicates the specific SQL statement to execute
Use mysqlDumpslow to view

MySQL provides a slow query log analysis tool, mysqlDumpslow, which can be used to analyze slow query log content.

Run the following command in the MySQL bin directory to view the usage format.
```
perl mysqldumpslow.pl --help
Copy the code
```

Slow query optimization

Summary of causes for slow query

Full table scan: Explain the type attribute all
Full index scan: Explain analyze the type attribute index
Index filter is not good: rely on index field selection, data volume and state, table design
Frequent back-table query overhead: minimize select * and use overwrite indexes

How do I determine whether the query is slow?

MySQL determines whether a statement is a slow query statement based on the execution time of the SQL statement. It compares the execution time of the current statement with the long_QUERy_time parameter. If the execution time of the statement is greater than long_query_time, the statement is recorded in the slow query log. The default value of long_query_time is 10s, which can be adjusted based on service requirements.

How do I determine if an index is applied?

You can run the explain command to check whether the KEY value in the SQL statement is NULL based on whether the table index is used during the SQL statement execution.

Is an index necessarily fast?

Query whether to use an index, only indicates the execution process of an SQL statement; A slow query is determined by the time it is executed, which means that there is no correlation between the use of an index and a slow query.

When we use an index, we should not only focus on whether it works, but also on whether it reduces the number of rows scanned by the query. If the number of rows scanned is reduced, efficiency will be improved. For a large table, you should not only create an index, but also consider index filtering, which will be fast.

How to improve the filtrability?

By index field selection, data volume and state, table design.

If you have a user table with 50 million records, then you need to locate 30 million users after sex=’ male ‘index filtering, SQL execution is not very fast. In fact, this problem is related to the filter of the index. For example, the filter of 10, 100 and 1000 records is different. Index filtering is related to index fields, the amount of data in the table, and the design structure of the table.

Paging query optimization

General paging

Paging queries like this can be implemented using a simple limit clause. Limit format:

SELECT * FROM table name LIMIT [offset,] rowsCopy the code

The first argument specifies the offset of the first return row, starting at 0;

The second parameter specifies the maximum number of rows to return;

If only one argument is given, it indicates the maximum number of rows to return.

If the offset is fixed, what effect does the number of records returned have on the execution time?

Select * from user limit 10000,1; Select * from user limit 10000,10; Select * from user limit 10000,100; Select * from user limit 10000,1000; Select * from user limit 10000,10000;Copy the code

Results: When querying records, the number of returned records was less than 100, and the query time was basically unchanged with a small gap. As the number of query records increases, it takes more and more time.

What is the effect of a fixed number of returned records on execution time if the query offset changes?

Select * from user limit 1,100; Select * from user limit 10,100; Select * from user limit 100,100; Select * from user limit 1000,100; Select * from user limit 10000,100;Copy the code

Results: When querying records, if the number of queried records is the same, the query time increases sharply with the increase of the offset when the offset exceeds 100. (This paged query mechanism starts with the first record in the database and slows down the subsequent queries, and the more data you query, the slower the overall query speed.)

Paging optimization solution:

Optimization with coverage index

Select * from user limit 10000,100; Select id from user limit 10000,100;Copy the code

Optimize with subqueries

Select * from user limit 10000,100; Select * from user where id>= (select id from user 10000,1) limit 100;Copy the code

Id primary key comparison is used (ID >=), and subqueries are optimized with override indexes.

conclusion

The teacher explained the notes are very detailed, according to the teacher said, their own hands-on practice again, feel very helpful to their own. If you think it’s useful, collect it.