How MySQL indexes work

Scan the qr code below or search wechat public account Rookie Fei Yafei, you can follow wechat public account, read more Spring source code analysis, Java concurrent programming, Netty source code series and MySQL working principle articles.

MySQL > select * from B+Tree; select * from B+Tree; select * from B+Tree;

Index Data Structure of B-tree and B+Tree
Index Data Structure of B-Tree and B+Tree
MySQL does not use arrays, hash tables, binary trees, etc. as indexes

Today we are going to talk about how indexes work in MySQL. This part of the knowledge is often used in the workplace, and it is almost mandatory to ask questions in interviews. So, whether it’s building a rocket for an interview, or turning a screw for a job, it’s very necessary to master how the index works.

First of all, it should be noted that all discussions in this article are based on the InnoDB storage engine.

The sample table

To illustrate, let’s create a sample table. The construction sentences are as follows

CREATE TABLE user (

`id` BIGINT ( 11 ) NOT NULL AUTO_INCREMENT,

`name` VARCHAR ( 64 ) COMMENT 'name'.

`age` INT ( 4 ) COMMENT 'age'.

PRIMARY KEY ( `id` ),

INDEX ( NAME )

) ENGINE = INNODB COMMENT 'User table';



INSERT INTO `user` ( `name`.`age` ) VALUES ( 'AA'.30), ('BB'.33), ('CC'.31), ('DD'.30), ('EE'.29 )

Copy the code

SQL > create table user (id is the primary key, name is the user name, age is the user name); For the convenience of the following description, 5 rows of data are inserted into the table. Since the primary key ID is self-incremented, the ids of these five rows are divided into 1~5.

The primary key index

A primary key index is also called a cluster index, which stores the data corresponding to the current primary key in a leaf node. What does that mean? For example, in the table user, id is the primary key index, so there is an index tree of ID. The leaf node of the index tree contains not only the primary key id value, but also the name and value values. For example, if the values of name and age are AA and 30 in the row where ID =1, the values of (1,”AA”,30) are stored in the index tree at the node where ID =1. The id index tree is shown as follows.

Let’s look at the execution flow of this SQL statement:

select * from user where id = 1;

Copy the code

This statement filters the WHERE condition with id=1, so the index tree with the primary key ID is used.

Select use id primary key index tree;
Find the first layer node of the index tree of id (the node where keywords 3 and 7 are located), because the condition where id= 1,1 is less than 3, so enter the left subtree of keyword 3.
Enter the second layer node of the ID index tree, and the second layer node is the leaf node. The leaf node stores the data of the table, and the keyword ID =1 exists, so R1 is returned. R1 represents the row where ID =1.

Back to the table

A common index, also called a non-clustered index or a secondary index, stores data in leaf nodes. Unlike a primary key index, the data stored in a common index is only the value of the primary key, rather than the data of the entire row. For example, in the above example table, name is a common index, and in its index tree, the data stored in the leaf node is the primary key ID value, as shown in the diagram below:

Let’s look at the execution flow of this SQL statement:

select * from user where name = 'BB';

Copy the code

This statement adds the filter name=’BB’ to the WHERE condition. Since we created an index for the name field when building the table, the name index tree will be used. In addition, the name index tree contains only the primary key ID value, which cannot meet the requirement of querying all the fields, and all the fields are stored in the primary key ID index tree. Therefore, after the value of primary key ID is found in the name index tree, the value of other fields in this row should be searched in the primary key index tree according to the found ID value. This process is called table-back. The process of going from a normal index tree back to a primary key index tree is called a table back. Therefore, the above SQL statement execution flow is as follows:

Select the name index tree;
If the value of ‘BB’ in the where condition is less than the value of ‘CC’ in the first level of the index tree, the index is entered into the left subtree of ‘CC’.
Select * from ‘BB’; select * from ‘BB’ where primary key id = 2;
Select primary key (id=2) from primary key (R2);
Continue searching back in the name index tree to find the next keyword ‘CC’ of ‘BB’ and find that ‘CC’ is not equal to ‘BB’ in the WHERE condition, so end the search.

Cover index

In the second example above, since there is only one record with name=’BB’, the table is returned only once. If there are multiple records that satisfy the condition name=’BB’, the table will be returned multiple times. Obviously, the more times the table is returned, the slower the SQL execution will be, so what can be done to avoid the table back? The answer is to override the index.

What exactly is an overwrite index? In the second example above, we used select * to query all of the fields. What if we didn’t need all of the fields, just the ID field? For example, select id from user where name = ‘BB’; As the primary key ID value has been stored in the leaf node of the NAME index tree, so the name index tree can directly meet our query requirements, so do not return to the table operation at this time, which is called overwriting index.

Overwriting an index can significantly improve query performance because it significantly reduces the number of back-table operations. Overwriting indexes is a very common SQL optimization tool and is very simple to use. In the development process, we usually recommend not to use SELECT * to query data. On the one hand, select * may return a lot of useless fields when the data is large, which wastes network resources. Another reason is to use overridden indexes as much as possible.

Associative index and leftmost matching principle

Select * from user where name=’BB’ and age =’BB’;

select name,age from user where name = 'BB';

Copy the code

The name index tree does not contain the age field, so we need to go back to the primary key id index tree to obtain the age field value. So what’s the best way to optimize it? Let this query do not need to do back table. There must be! Use an overwrite index. How do you use it?

MySQL > create index (name, name); MySQL > create index (name, name); MySQL > create index (name, name, name); For example, we now create a joint index for the name and age fields, executing the following statement:

Mysql > delete index from name

alter table user drop index `name`;

Create index name, age

alter table user add index(`name`.`age`);

Copy the code

At this point, each node in the index tree of the joint index stores not only the name field value, but also the age field value, as shown in the diagram below:

Select name,age from user where name = ‘BB’; select name,age from user where name = ‘BB’;

When using a federated index, each column of the index can only be evaluated by equivalent value, because MySQL uses the leftmost matching principle to perform matching. That is, the matching starts from the leftmost column of the index and stops when it encounters a range lookup, such as like, >, <, or between. You can use the following three examples to understand this.

select name,age from user where name = 'BB' and age = 33; # When using a federated index, the name column and the age column are matched in sequence.

select name,age from user where name like 'B%' and age = 33; Select * from age; select * from age; select * from age; select * from age;

select name,age from user where age = 33; SQL > select name, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age, age

Copy the code

Why does MySQL follow the leftmost matching principle? This is because the data on all nodes of a B+Tree is ordered. When we create a joint index, we first ensure that the first column of the data is ordered, and then ensure that the second, third, and subsequent columns are ordered. For example, in this index tree, the name column is ordered on all data, but the age column is not ordered. Age is ordered only if the names are the same. When we are looking for data, if we encounter range search, since the following columns cannot be guaranteed to be in order, we cannot continue to carry out equivalent matching, and can only carry out full table scan for the following columns.

conclusion

This article mainly describes a query SQL statement is how to query data through the index, and what is a back table. In order to improve query performance when using indexes, you can create reasonable indexes and use overwrite indexes to reduce back operations to improve query performance. Finally, in the use of a joint index, the order of index columns should be paid attention to due to the left-most matching principle. When creating a joint index, it is necessary to consider how to arrange the order of fields in the index to meet more query scenarios and avoid creating multiple indexes.

The resources

High Performance MySQL
MySQL Practice 45 Lecture by Lin Xiaobin

The sample table

The primary key index

Back to the table

Cover index

Associative index and leftmost matching principle

conclusion

The resources

Related Posts

Message middleware – RabbitMQ

[Java Concurrency] ReentrantLock

Declarative call –Feign