Mysql index basic principles from shallow to deep analysis

preface

Index is one of the most important means of SQL optimization, this article from basic to principle, take you to master the depth of index.

First, index basis

What is an index

An Index is a data structure that helps MySQL obtain data efficiently. An Index is critical to good performance, especially when the amount of data in a table is increasing. Index optimization is probably the most effective way to optimize query performance. Indexes can easily improve query performance by several orders of magnitude.

In layman’s terms, an index is like a table of contents for articles, used to improve the efficiency of queries.

2. Index classification

Common index types are: primary key index, unique index, normal index, full text index, composite index

2.1 primary key index

When a table sets a column as a primary key, that column is the primary key index

create table a (  
	id int primary key auto_increment,  
	name varchar(20) not null default ''  
);  
Copy the code

Select * from table where id = primary key (s) and id = primary key (s);

alter table table_name add primary key (column_name);
Copy the code

1.2. Common indexes

An index built from a normal column in a table, with no restrictions

Create index Index name on table_name(column1); Alter table table_name add index index name (column1);Copy the code

1.3. Full-text index

Full-text indexes are mainly for text files, such as articles and titles. Before MySQL5.6, only MyISAM storage engine supported full-text indexing, and after MySQL5.6 InnoDB storage engine also supported full-text indexing.

create table c( id int primary key auto_increment , title varchar(20), content text, fulltext(title,content) ) engine=myisam charset utf8; insert into c(title,content) values ('MySQL Tutorial','DBMS stands for DataBase ... '), ('How To Use MySQL Well','After you went through a ... '), ('Optimizing MySQL','In this tutorial we will show ... '), ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ... '), ('MySQL vs. YourSQL','In the following database comparison ... '), ('MySQL Security','When configured properly, MySQL ... ');Copy the code

1.4. Unique index

By definition, values in index columns must be unique, but null values are allowed. Select * from table D where name is the unique index. The primary key field cannot be null and cannot be repeated

create table d(
	id int primary key auto_increment , 
	name varchar(32) unique
    ) 
Copy the code

1.5. Composite indexes

An index built from a combination of columns in which no null values are allowed.

ALTER TABLE 'table_name' ADD INDEX index_name('col1','col2','col3');Copy the code

Composite indexes follow the left-most prefix principle, and are best used with the columns most commonly used for retrieval or sorting in the left-most descending order. The combined index is equivalent to creating three indexes col1, COL1COL2, col1col2COL3, but COL2 or COL3 cannot use the index. When a combined index is used, the column name may be too long and the key of the index may be too large, which reduces the efficiency. If possible, only the first few characters of COL1 and col2 can be used as the index. ALTER TABLE ‘table_name’ ADD INDEX index_name(col1(4),col2 (3));

Indicates that the first four characters of COL1 and the first three characters of col2 are used as indexes

3. Brief analysis of index mechanism

Here we first briefly analyze the mechanism of the index, for the following in-depth do some paving.

3.1 The principle of index to speed up query

The traditional query method is traversal according to the order of the table. No matter how many pieces of data are queried, MySQL needs to traverse the data of the table from beginning to end.

After we add the index, MySQL usually generates an index file through the BTREE algorithm. When we query the database, we find the index file and traverse it. We use the efficient half-search method to find the corresponding key and obtain the data.

3.1 the cost of indexing

Indexes are created to generate index files and occupy disk space. The index file is a binary tree file, so you can imagine that our DML operations (data manipulation language, add, delete, modify table records) also modify the index file, so performance will be reduced accordingly.

Index storage data structure

As mentioned above, an index is actually a data structure in a database that satisfies a particular lookup algorithm, and that references (points to) data in a way that allows you to implement advanced lookup algorithms on top of those data structures.

As we probably all know, MySQL indexes are B+ tree data structures, of course, there are also hash tables, ordered arrays and other common data structures.

1, hash table

Hash table is a kind of key-value storage structure. As long as we input the value to be searched (key), we can find its corresponding value (value). The idea of hashing is very simple, you put the value in an array, you use a hash function to convert the key to a certain location, and then you put the value in that location in the array.

Inevitably, when multiple keys are converted to hash functions, the same value will occur. One way to handle this is to pull out a linked list.

Therefore, it is important to note that the linked list behind the hash table is not ordered. Interval queries require scanning the list, so the hash table structure is suitable for scenarios where only equivalent queries are available, such as Memcached and other NoSQL engines.

2. Ordered arrays

Another familiar array structure, ordered arrays perform well in both equivalent query and range query scenarios.

If you just look at query efficiency, ordered arrays are great data structures. However, when you need to update the data, the trouble is that you insert one record in the middle and have to move all the records behind it, which is too expensive.

So, ordered array indexes are only good for static storage engines, like if you want to store all the population information for a city in 2017, which is not going to change.

These two are not the most important indexes, the common index data structure is the tree structure, the tree is a relatively complex data structure in the data structure, we come to understand the index tree structure step by step.

3. Binary search

Binary Search is also called Binary Search, which is a highly efficient Search method. However, split lookup requires that the linear table must have a sequential storage structure, and the elements in the table are ordered by keywords.

Search method: First, assume that the elements in the table are arranged in ascending order, compare the keywords of the records in the middle of the table with the search keywords, if the two are equal, the search succeeds; Otherwise, the middle position record is used to divide the table into the front and back two sub-tables. If the key word of the middle position record is greater than the search key word, the former sub-table is further searched; otherwise, the latter sub-table is further searched. Repeat the above process until you find a record that meets the criteria, so that the search succeeds, or until the child table does not exist, at which point the search fails.

The equivalence and comparison queries of ordered arrays mentioned above are very efficient, but updating data is problematic.

To support frequent changes, such as inserting data, we need to use linked lists. If it’s a linked list, if it’s a single linked list, it’s not efficient enough.

So, is there a linked list that can use binary lookup?

In order to solve this problem, BST(Binary Search Tree) is also called Binary Search Tree.

4. Binary search tree

Binary trees have the following properties: the key value of the left subtree is smaller than that of the root, and the key value of the right subtree is larger than that of the root.

The following figure shows a binary search tree

The search time in this balanced state is order log(n).

But there is a problem with binary search trees: they degenerate into linked lists in some extreme cases.

The same six numbers, 2,3,4,6,7,8, would look like this if the data we inserted happened to be in order 👇

At this point, the binary search tree search time is order n, just like a linked list.

What causes it to split? Because the depth difference between the left and right subtrees is so great, the left subtree of the tree has no nodes at all — that is, it is not balanced.

So, do we have a more balanced tree that’s not so different in depth from the left and right subtrees? Balanced binary search trees, or AVL trees.

5, the AVL tree

AVL Trees (Balanced binary search Trees) : The absolute value of the depth difference between the left and right subtrees cannot exceed 1.

For example, if the depth of the left subtree is 2, the depth of the right subtree can only be 1 or 3. At this time we insert 2,3,4,6,7,8 in order again, will not “split” 👇

How is the AVL tree balanced? Two main operations are used: left rotation and right rotation.

(1) Insert 1, 2, 3

When we insert 1 and 2, according to the definition of binary search tree, 3 must be to the right of 2. At this time, the depth of the right node of root node 1 will become 2, but the depth of the left node will be 0, because it has no children, so it will violate the definition of balanced binary tree.

So what should we do? And since it’s a right node followed by a right node, right-right, we’re going to put the 2 up here, and that’s called left-handed rotation.

(2) Similarly, if we insert 3, 2, 1, this is going to be left left, this is going to be right-handed, and we’re going to bring the 2 up.

Since balanced binary trees maintain equilibrium and do not degenerate, can we use balanced binary trees to store indexes? B: Yes.

When we use a tree structure to store indexes, accessing a node requires an I/O to disk. The smallest unit InnoDB can operate on a disk is a page (or disk block). Unlike main memory, disk I/O has mechanical motion costs, so disk I/O time consumption is significant.

So if each node stores too little data, to find the data we need from the index, we have to visit more nodes, which means too many interactions with disk.

So what’s the solution?

(1) Let each node store more data.

(2) Let the node have more keywords.

The more keywords there are on a node, the more Pointers we have, which means more forks (we call them “paths”).

Because the more branches there are, the deeper the tree becomes (the root node is 0). In this way, the tree changed from tall and thin to short and fat.

At this point, our tree is no longer binary, but multi-fork, or multipath.

6, Multi-path balanced search Tree (B-tree)

Now let’s look at the multi-path balanced search tree, or B tree.

B tree is a multi-fork balanced search tree. The main features are as follows:

(1) There are multiple elements stored in nodes of B tree, and each inner node has multiple branches.

(2) The elements in the node contain key values and data, and the key values in the node are arranged from large to small. That is, data is stored at all nodes.

(3) Elements in the parent node do not appear in the child node.

(4) All the leaf nodes are located in the same layer, the leaf nodes have the same depth, and there is no pointer connection between the leaf nodes.

For example, let’s take a look at a few queries:

(1) If the search key<17, go to the left child node;

(2) If 17<key<35, go to the middle child node;

(3) If key>35, go to the right child node;

(4) If key=17, direct hit;

(5) If key=35, direct hit;

(6) B tree looks perfect, is that the end of it? Don’t.

B tree does not support fast lookup of range query. If you think about a situation like this, if we want to find the data between 10 and 35, after finding 15, we need to go back to the root node and iterate again. We need to iterate from the root node many times, so the query efficiency needs to be improved.

If data stores row records, the size of rows will increase as the number of columns increases. In this case, the amount of data that can be stored in a page is reduced, the tree is correspondingly higher, and disk I/OS are increased

So this brings us to our ultimate data structure, the B+ tree.

7, Enhanced version of multi-path balanced search Tree (B+Tree)

MySQL uses the B+ tree to build indexes. MySQL uses the B+ tree to build indexes. The main difference between B+ trees and B trees is whether non-leaf nodes store data

B tree: Both non-leaf and leaf nodes store data. B+ tree: Only leaf nodes store data, non-leaf nodes store key values. Leaf nodes are connected by bidirectional Pointers, and the lowest leaf node forms a bidirectional ordered linked list. Let’s look at the storage structure of InnoDB’s B+ tree:

To tell you the main points of this picture:

(1) the outermost block, block, which we call a disk block, you can see every disk block contains several data items (pink) and pointer (shown in yellow/gray), such as a root node disk contains data item 17 and 35, contains a pointer (P1, P2, P3, P1 said less than 17 disk blocks, P2 said between 17 and 35 disk blocks, P3 indicates disk blocks larger than 35. The real data is in the leaf nodes 3, 4, 5… , 65. Non-leaf nodes only store the real data and only the data items that guide the search direction, such as 17 and 35, which do not really exist in the data table.

(2) Bidirectional Pointers are used to connect leaf nodes, and the lowest leaf node forms a bidirectional ordered linked list.

7.1 Storage Capacity

For example, if a record is 1K, a leaf node (a page) can store 16 records. How many Pointers can a non-leaf node store?

Assume that the index field is of type BigINT and is 8 bytes long. The pointer size is set to 6 bytes in the InnoDB source code, making a total of 14 bytes. Non-leaf nodes (one page) can store 16384/14=1170 such cells (key + pointer), representing 1170 Pointers.

When the tree depth is 2, there are 1170^2 leaf nodes, and the data that can be stored is 1170117016=21902400.

When searching for data, a page search represents one IO, that is, for a table of about 20 million, the query data needs to access the disk up to three times.

So in InnoDB, the B+ tree depth is usually 1-3 layers, which can accommodate tens of millions of data storage.

7.2 Query efficiency

Let’s look at the data search process of B+Tree:

(1) for example, if we want to find 35, we find the key value in the root node, but it is not a page child node, so we continue to search. 25 is the critical value of the left-closed and right-open interval of [17,35), so we go to the middle child node. However, if we continue to search, it is also the critical value of the left-closed and right-open interval of [28,34). So I go to the child node on the left, and I find the data I need on the leaf node.

(2) If it is a range query, for example, to query the data from 22 to 60, when 22 is found, all data nodes can be accessed at one time only by traversing the sequence of nodes and Pointers, which greatly improves the efficiency of interval query (there is no need to return to the upper parent node for repeated traversal search).

(3) Pointers to adjacent leaf nodes are added to form a B+Tree with sequential access Pointers. This is done to improve the efficiency of interval search. As long as the first value is found, the following values can be searched sequentially.

7.3. Summary of B+ tree characteristics

To summarize, InnoDB’s B+Tree features:

(1) It is a variant of B Tree. It can solve all the problems that B Tree can solve. What are the two biggest problems solved by B Tree? (More keywords per node; More ways)

(2) Stronger ability to scan libraries and tables (if we want to scan the whole table, we only need to traverse leaf nodes, rather than the whole B+Tree to get all the data)

(3) The disk read and write capability of B+Tree is stronger than that of B Tree (the root node and branches do not store data area, so a node can store more keywords and load more keywords at a time).

(4) Stronger sorting ability (because there is a pointer to the next data area on the leaf node, the data forms a linked list)

(5) More stable efficiency (B+Tree always gets data from leaf nodes, so IO times are stable)

Clustered index and non-clustered index

The two most common storage engines in MySQL are MyISAM and InnoDB, which implement non-clustered index and clustered index respectively.

First of all, we will introduce some concepts. In the classification of indexes, we can divide them into “primary key index” and “secondary index” according to whether the key of the index is the primary key. The indexes established by the primary key value are called “primary key index”, and the others are called “secondary index”. Therefore, there can be only one primary key index and many secondary indexes.

1. MyISAM — non-clustered index

MyISAM storage engine adopts non-clustered index. The primary key index of non-clustered index is basically the same as the secondary index, but the primary key index is not allowed to repeat, and no null value is allowed. The key of their leaf node stores the physical address of the data corresponding to the key value.

Data tables with non-clustered indexes are stored separately from index tables.

Data in a non-clustered index is stored according to the insertion order of the data. Therefore, non-clustered index is more suitable for single data query. Insert order is not affected by key values.

Consider: why would a secondary index point to the same content as a primary key index for a non-clustered index? Index is used to query, isn’t it used in what? If the primary key is not the primary key, then secondary indexes are required.

2. InnoDB — Cluster index

The leaf node of the primary key index of the cluster index stores the data corresponding to the key value, while the leaf node of the secondary index stores the primary key value of the data corresponding to the key value. Therefore, the primary key value length should be as small as possible and the type should be as simple as possible.

The data for the clustered index is stored together with the primary key index.

Can be seen in the image above secondary index of the leaf node of the data storage is a primary key value, the primary key index of the leaf node of the data storage is the data itself, that is to say, the data and indexes are stored together, and indexing query to place is data (data) itself, then the order of the index and the order of the data itself is the same.

Because the clustered secondary index stores the key value of the primary key, it can reduce costs when rows move or pages split, since the secondary index is not maintained. But because primary key indexes store the data itself, clustered indexes take up more space.

Clustering index when inserting new data to a much slower than the cluster index, because when inserting new data need to detect whether the primary key repeat, the main index of the need to traverse all the leaf nodes, rather than the clustering index data are stored leaf node address, takes up less space, so the distribution and query the less I/O, but the cluster index of main index is stored in the data itself, Data occupies a large space, is distributed in a larger range, and may occupy many sectors. Therefore, more I/ OS are required to complete the traversal.

4. Principles of index use

1. Dispersion of columns

The first one is called column dispersion. Let’s first look at the following dispersion formula:

count(distinct(column_name)) : count(*)

All different values of the column and the ratio of all data rows. In the case of the same number of rows, the larger the numerator, the higher the dispersion of the column.

mysql> SELECT * FROM `test`.`user` ORDER BY `id` LIMIT 10 OFFSET 0; +----+-----------+--------+-------------+ | id | name | gender | phone | +----+-----------+--------+-------------+ | 1 | Qin putting | 0 | 13601722591 | | 2 | li yi moment | 0 | 15204160836 | | 3 | Chen gen | 0 | 13601994087 | | | 4 Shen Yi continuous | 0 | 15507785988 | | | 5 Zhu Tongtai | 1 | 13201268193 | | | 6 Zhou Taorui | 1 | 15705478612 | | | feng le plus 7 | 0 | 13705834063 | | | eight king enthalpy | 1 | 15006956358 | | | 9 Astragalus | 0 | 15108012536 | | | 10 wu Ji swim | | 15301860708 | 0 + - + -- -- -- -- -- -- -- -- -- -- - + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 10 rows in the set (0.00) sec)Copy the code

Simply put, if the column has more repetitions, the dispersion is lower, and the fewer repetitions, the higher the dispersion.

Now that we know the concept of dispersion, let’s consider the difference between indexing name and gender.

When we use the index established on gender to retrieve data, there are too many duplicate values, so more rows need to be scanned. For example, let’s now create an index on the GENDER column and look at the execution plan.

ALTER TABLE user ADD INDEX idx_user_gender (gender); EXPLAIN SELECT * FROM 'user' WHERE gender = 0;Copy the code

+----+-------------+-------------+------------+------+-----------------+-----------------+---------+-------+---------+-- --------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered  | Extra | +----+-------------+-------------+------------+------+-----------------+-----------------+---------+-------+---------+-- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | user | NULL | ref | idx_user_gender | idx_user_gender | 2 | const | 2492574 | | 100.00 NULL | +----+-------------+-------------+------------+------+-----------------+-----------------+---------+-------+---------+-- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

Name has a higher degree of dispersion, such as the name “Chen Gen”, which only needs to scan one line.

ALTER TABLE user ADD INDEX idx_user_name (name); EXPLAIN SELECT * FROM 'user' WHERE name = 'Chen '; EXPLAIN SELECT * FROM' user 'WHERE name =' Chen ';Copy the code

+----+-------------+-------------+------------+------+----------------------------+----------+---------+-------+------+- ---------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------------+------------+------+----------------------------+----------+---------+-------+------+- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | user | NULL | ref | idx_name | idx_name | 1023 | const | 1 | | NULL | 100.00 +----+-------------+-------------+------------+------+----------------------------+----------+---------+-------+------+- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

Look at the index on the table. Cardinality stands for Cardinality, which represents the estimated number of values that are not duplicated. The closer the cardinality of the index is to the total number of rows in the table, the higher the dispersion of the columns.

mysql> show indexes from user; +-------------+------------+-------------------+--------------+-------------+-----------+-------------+----------+------ --+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------------+------------+-------------------+--------------+-------------+-----------+-------------+----------+------ --+------+------------+---------+---------------+ | user | 0 | PRIMARY | 1 | id | A | 4985145 | NULL | NULL | | BTREE | | | | user | 1 | idx_name | 1 | name | A | 2605146 | NULL | NULL | YES | BTREE | | | | user | 1 | idx_user_gender | 1 | gender | A | 1 | NULL | NULL | YES | BTREE | | | | user | 1 | comidx_name_phone | 1 | name | A | 2595718 | NULL | NULL |  YES | BTREE | | | | user | 1 | comidx_name_phone | 2 | phone | A | 4972647 | NULL | NULL | YES | BTREE | | | +-------------+------------+-------------------+--------------+-------------+-----------+-------------+----------+------ - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 5 rows in the set (0.00 SEC)Copy the code

If there are too many duplicate values in the index B+Tree, the MySQL optimizer will find that the index is similar to the full table scan.

2. The left-most match of the composite index

We have been talking about single-column indexes, but sometimes we can create composite indexes for multi-condition queries. Single-column indexes can be considered as special composite indexes.

For example, we create a combined index for name and phone on the user table.

ALTER TABLE user add INDEX comidx_name_phone (name,phone);
Copy the code

A composite index in B+Tree is a composite data structure that builds the search Tree from left to right (name on the left, phone on the right).

As can be seen from this figure, name is in order while phone is in disorder. Phones are in order when their names are equal.

In this case, when we use where name= ‘wangwu’ and phone = ‘139xx’ to query data, B+Tree will preferentially compare the name to determine the next search direction, left or right. If the names are the same, compare phones. However, if the query condition does not have name, we do not know which node should be searched in the first step, because name is the first comparison factor when the search tree is established, so no index is used.

2.1. When are composite indexes used

Therefore, when we build composite indexes, we must put the most frequently used columns on the far left. For example, can the following three statements be used to combine indexes?

1) Use two columns, you can use the combined index:

Mysql > EXPLAIN SELECT * FROM user WHERE name= 'Chen' AND phone = '13601994087'; +----+-------------+-------------+------------+------+----------------------------+-------------------+---------+------- ------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref |  rows | filtered | Extra | +----+-------------+-------------+------------+------+----------------------------+-------------------+---------+------- ------+------+----------+-------+ | 1 | SIMPLE | user_innodb | NULL | ref | comidx_name_phone | comidx_name_phone | 1070 | const, const | 1 | | NULL | 100.00 +----+-------------+-------------+------------+------+----------------------------+-------------------+---------+------- -- -- -- -- -- - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

2) use the name field on the left to use the composite index:

Mysql > EXPLAIN SELECT * FROM user WHERE name= 'Chen '; +----+-------------+-------------+------------+------+----------------------------+----------+---------+-------+------+- ---------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------------+------------+------+----------------------------+----------+---------+-------+------+- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | user_innodb | NULL | ref | comidx_name_phone | idx_name | 1023 | const 19 | | | 100.00  NULL | +----+-------------+-------------+------------+------+----------------------------+----------+---------+-------+------+- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

Mysql > alter table table_name select * from ‘phone’;

mysql> EXPLAIN SELECT * FROM user WHERE phone = '13601994087'; +----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----- --------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra  | +----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | user | NULL | | NULL ALL | NULL | NULL | NULL | 4985148 | | 10.00 Using the where | +----+-------------+-------------+------------+------+---------------+------+---------+------+---------+----------+----- --------+ 1 row in set, 1 Warning (0.00 SEC)Copy the code

2.2. How to create a composite index

When creating a (A, B, C) combined index, it is equivalent to creating a (a) single-column index, (A,b) combined index, and (A, B, C) combined index. To make the index effective, only a and A, B and a, B, C can be used. Of course, b and A also work, because SQL will optimize them.

Use the where b =? And the where b =? and c=? And where a =? and c=? The index cannot be used. You can’t use the first field, you can’t break it.

This is the leftmost matching rule for MySQL composite indexes.

3. Overwrite indexes

3.1, back to the table

In the clustering index, the secondary index is used to search for data. First, the primary key index is found by the index, and then the data that is not in the index is found by the primary key value. It scans one more index tree than the query based on the primary key index, and this process is called back to the table.

For example, select * from user where name = ‘lisi’;

3.2 overwrite indexes

In secondary indexes, whether single-column or associative, if the select column can be retrieved only from the index, rather than from the data area, the index used is called an overwrite index, thus avoiding the need to return to the table.

Let’s create a federated index:

ALTER TABLE user add INDEX 'comixd_name_phone' ('name','phone');Copy the code

All three queries use overwrite indexes:

EXPLAIN SELECT name,phone FROM user WHERE name= 'Chen' AND phone = '13601994087'; EXPLAIN SELECT name FROM user WHERE name= 'Chen' AND phone = '13601994087'; EXPLAIN SELECT phone FROM user WHERE name= 'Chen' AND phone = '13601994087';Copy the code

The value “Using index” in Extra indicates that an overwritten index is used.

Mysql > EXPLAIN SELECT name FROM user_innodb WHERE name= 'Chen' AND phone = '13601994087'; +----+-------------+-------------+------------+------+----------------------------+-------------------+---------+------- ------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len |  ref | rows | filtered | Extra | +----+-------------+-------------+------------+------+----------------------------+-------------------+---------+------- ------+------+----------+-------------+ | 1 | SIMPLE | user_innodb | NULL | ref | idx_name,comidx_name_phone | Comidx_name_phone | 1070 | const, const | 1 | | 100.00 Using index | +----+-------------+-------------+------------+------+----------------------------+-------------------+---------+------- -- -- -- -- -- - + - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

Select *, which is not used to override the index.

Obviously, because overwriting an index reduces IO times and the number of visits to the data, it can greatly improve query efficiency.

4. ICP under Index Conditions

Index Condition Pushdown (ICP) is used to obtain a tuple of a table from an Index on a particular table. This method is used for single table scans rather than multiple join tables. Specifically, it is a way for a single table to scan with an index to retrieve data. Here’s what it does

(1) One is to reduce the number of complete records (a complete tuple) read;

InnoDB clustered INDEX is not valid, only for non-clustered INDEX such as SECOND INDEX.

Close the ICP:

set optimizer_switch='index_condition_pushdown=off';
Copy the code

View parameters:

show variables like 'optimizer_switch';
Copy the code

Now we need to query all the people whose name is Chen Gen and the last four digits of their mobile phone number are 4087. SQL query:

SELECT * FROM user WHERE name= 'chengen' and phone LIKE '%4087';Copy the code

This SQL can be executed in two ways:

1. Search for all secondary index data whose name is’ Chen gen ‘according to the composite index, and then go back to the table and search for all eligible data (19 data) on the primary key index. It then goes back to the Server layer, where it filters out the person 4087 after the phone number.

2. Search out all secondary index data (19 indexes) whose name is’ Chen Gen ‘according to the composite index, then screen out the index with the last four digits of the mobile phone number 4087 (1 index) from the secondary index, then go back to the table, query all eligible data (1 data) on the primary key index, and return to the Server layer.

Obviously, the second method queries less data on the primary key index.

Note that indexes are compared in the storage engine and data records are compared in the Server layer. When a phone condition cannot be used for index filtering, the Server layer does not pass the phone condition to the storage engine, so two unnecessary records are read.

At this point, if there are 100,000 records whose name= ‘Chen gen’, there are 99,999 records that do not need to be read.

Execute the following SQL, Using WHERE:

Mysql > EXPLAIN SELECT * FROM user WHERE name= 'chengen' AND phone LIKE '%4087'; +----+-------------+-------------+------------+------+-------------------+-------------------+---------+-------+------+- ---------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------------+------------+------+-------------------+-------------------+---------+-------+------+- ---------+-------------+ | 1 | SIMPLE | user_innodb | NULL | ref | comidx_name_phone | comidx_name_phone | 1023 | const | the | | 11.11 Using the where | +----+-------------+-------------+------------+------+-------------------+-------------------+---------+-------+------+- + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + 1 row in the set, 1 warning (0.00 SEC)Copy the code

Using Where indicates that the data retrieved from the storage engine does not meet all conditions and needs to be filtered at the Server layer.

First use the name condition to scan the index range, read the data table records, and then compare to check whether phone LIKE ‘%4087’ condition is met. At this point, only 1 out of 19 matches the criteria.

5, index creation use summary

Because indexes are powerful for improving query performance, our goal is to use them wherever possible.

5.1. Index creation

Based on the analysis in the previous section, we summarize some considerations for index creation:

1, create index on (on) where order and join

2. Do not have too many indexes. — Wasted space and slow updates.

3. Do not create indexes for fields with low differentiation, such as gender. — The dispersion is too low, resulting in too many scanned lines.

4. Frequently updated values. Do not use as primary keys or indexes. Split page –

5. Composite indexes put hashed (highly differentiated) values first. — Left-most prefix matching rule

Create a composite index instead of changing a single column index. — A composite index replaces multiple single-column indexes (since MySQL can only use one index at a time, it is better to use a composite index when multiple conditional queries are frequently used)

7, long field, how to create index? — Use short indexes. When field values are long, indexing takes up a lot of space and searches are slow. We can create an index by taking the first part of a field, which is called a prefix index.

create table shop(address varchar(120) not null); 
alter table shop add key (address(12));
Copy the code

8. It is not recommended to use unordered values (such as ID, UUID) as indexes — when the primary key is uncertain, leaf nodes will be frequently split and disk storage will be fragmented

5.2. When will the index become useless

Replace \SUBSTR\CONCAT\sum count avg (+ – * /); replace\SUBSTR\CONCAT\sum count avg (+ – * /);

explain SELECT * FROM 't2' where id+1 = 4;
Copy the code

2. The string is not quoted, and implicit conversion occurs

explain SELECT * FROM 'user' where name = 136; 

explain SELECT * FROM 'user' where name = '136';
Copy the code

Where (ABC %, like %2673%, like %888) where (ABC %, like %2673%, like %888) Why is that?

explain select * from user where name like 'wang%'; 

explain select * from user where name like '%wang';
Copy the code

Filtering is too expensive to use indexes. Full-text indexes can be used at this point.

4, negative query NOT LIKE

explain select *from employees where last_name not like 'wang'
Copy the code

! = (<>) and NOT IN IN some cases:

explain select * from user where id not in (1) 
explain select * from user where id <> 1
Copy the code

Any column containing a NULL value will not be included in the index. Any column containing a NULL value in a compound index is invalid for the compound index. So we don’t want the default value of the field to be NULL when we design the database.

MySQL queries use only one index, so if the WHERE clause already uses an index, the order by column does not use an index. Therefore, the database default sort can meet the requirements of the situation do not use the sort operation; Try not to include more than one column sort, and create composite indexes for those columns if necessary.

Note that whether an SQL statement uses an index depends on the database version, data volume, and data selectivity.

The last

Recently many people are in the interview, I also sorted out quite a lot of interview materials, I hope to help you. I have compiled a: Spring series family barrel, Java systematic information (including 2021 latest Java core knowledge, interview topics and 20 years of summary of the Internet real questions, e-books, etc.), friends who need to pay attention to the public number [procedures yuan Xiao wan] can be obtained.