preheating

More like a paragraph: without a cold biting, how fragrant plum blossom, learning is boring, please adhere to! I learned this article from Mr. Dinky. Do not understand their own search ha! It takes about 25 minutes to read this article!

Hello, everyone. In the previous section, we have generally understood the difference and implementation of transaction isolation level, application techniques, etc. Today we introduce the index of the database!

start

Why MySQL select Innodb?

Let’s start by explaining why Innodb is the default engine for MySQL.

  1. Before 5.1, MySQL used MyISAM. MyISAM is stored as a non-clustered index, that is, indexes are stored separately from row records. After 5.1, MySQL changed to Innodb. Innodb stores clustered indexes, that is, indexes and records are stored together. From this point, we can see that Innodb is better than MyISAM in query efficiency. Certain program saves a lot of disk IO consumption problem
  1. With the development of database for data consistency requirements are becoming higher and higher, transactions appear, MySQL choose Innodb has a great reason is because of the support problem of transactions. And there is no transaction log. Data security cannot be guaranteed
  1. Innodb supports row-level locking, while MyISAM only supports table locking, which cannot meet the requirements of concurrency when the data volume is slightly high.

The most important thing is that with the development of the present affairs are more important factors.

How to choose the underlying algorithm of Innodb

The hash index

Let’s take a look at hash indexes. Hash indexes, whether stored or fetched, are hashed through, converting keys into a string of computer addresses for storage. The same thing happens when you query for values. Therefore, the query efficiency is very high, but it does not support range search, so it cannot meet the needs of the current data query. But if you’re looking up a single value using a hash index is great.

Orderly array

The ordered array is similar to a linked list. If the data volume is too large, the query data must be traversed one by one. This efficiency is very slow. So don’t finish in this way.

Binary tree index

Normally when we iterate over data we iterate over a linked list, an array, etc. Starting at zero is slow. In order to improve the query efficiency, the binary tree is used to optimize the query performance. But there is an extreme situation when it comes to new data. The following is an example. According to the characteristics of binary trees, new nodes will be determined before the current node belongs to which node. If each new value is larger than the next, the binary tree skew will occur.

The query performance of a tilted binary tree is the same as that of a linked list, which is obviously not what we want. So the binary tree is pass

1,2,3,4,5,6,8,9
Copy the code

Red and black tree

With the evolution of binary tree, red-black tree is adopted as the underlying data structure.

Red-black tree is a balanced binary tree, which can perfectly solve the problems left by binary trees. However, because of this balancing feature, red-black trees are also not suitable as the default data structure, because the balanced tree is too large, which is very bad for data storage. Because the tree is too high, multiple disk I/OS will be used to query data. Because a data page is only 4K, it is impossible to store all data.

As far as we know, the disk IO addressing time is about 10ms, if the data volume is up to a certain number of millions. This query speed is a bit of a headache. So to continue to optimize the underlying data structure. And eventually it gets transferred to the B tree.

B tree

The appearance of B tree solves the problem of red black tree (balanced binary tree) and tree height. His implementation method is to make the leaf nodes have the same depth, but there is a problem of data query, when the query for a period of time, usually through the upper node for reverse check, which greatly reduces the efficiency of data query. So B tree passes

B + tree

Developers go round and round and end up in the B+ tree, which is an extension of the B tree. Compatible with the B-tree while making the underlying nodes like a linked list. This finally solved the problem in all of the above data structures.

The index type

The primary key index

It is a special kind of unique index that does not allow null values. CREATE INDEX cannot be used to CREATE a primary key INDEX. ALTER TABLE is used instead.

Clustered indexes are really primary key indexes

Let’s see for example

create table T(
id int primary key, 
k int not null, 
name varchar(16),
index (k))engine=InnoDB;
Copy the code

ID is a primary key so there is a primary key tree, k is an index field, and name is a normal field.

It is not difficult to see from the figure that, according to the content of the leaf node, the index type is divided into primary key index and non-primary key index. Leaf nodes indexed by primary keys hold entire rows of data. In InnoDB, primary key indexes are also called clustered indexes. Leaf node contents that are not primary key indexes are primary key values. In InnoDB, non-primary key indexes are also called secondary indexes.

Select * from B+ tree where ID=500; select * from B+ tree where ID=500; This effect cannot be achieved if some of the data in the node is not in the tree. The data can be returned to the client only after the corresponding column name data is queried back to the table.

Select * from T where ID=500 select * from T where ID=500Copy the code

Let’s talk about this SQL again, because the query here is the value of k field, k field is a common index, so the data structure is stored as shown in the figure on the right. If the user wants to find all column names in all T tables, he must first look up k=5 in the K-tree, get ID = 500, and then return to the ID tree with 500 to query R4. This process is the table back, the more times the table back, the slower the time.

Select * from T where k=5Copy the code

To sum up, this is the reason why many people add an index to improve the query efficiency when they are slow to solve data query. So the development process as far as possible to use primary key query.

The only index

Similar to normal indexes, except that the value of the index column must be unique, but empty values are allowed. If it is a composite index, the combination of column values must be one

Normal index

This is the most basic index, it has no restrictions

The full text indexing

The FULLTEXT index is used for full-text search. Only InnoDB and MyISAM storage engines support FULLTEXT indexes and only CHAR, VARCHAR and TEXT columns

Index maintenance

In the work, usually in order to improve performance, carelessly added some indexes, will lead to too many indexes, appropriate. So we usually have a maintenance process for the index.

A common problem with index maintenance is that if you insert a new row ID of 700, you only need to insert a new record after the R5 record. If the newly inserted ID value is 400, it is relatively troublesome, and you need to logically move the following data to make space. Even worse, if the data page R5 is on is already full, according to the B+ tree algorithm, it needs to apply for a new data page and move some data there. This process is called page splitting. In this case, performance naturally suffers.

Conversely, if a piece of data is deleted and the utilization is low, there will be a data merge process, which is the reverse engineering of the split process.

Regardless of page splitting or split reverse engineering, overall space utilization is reduced by approximately 50% in addition to performance.

Cover index

This is not easy to say, give an example to feel for yourself.

Select * from user; select * from user; select * from user; select * from user; In this case, k is an index, so there is an index tree. This index tree happens to hold the data of the leaf node with ID. Therefore, the k index tree can be directly queried, without the need to go back to the primary key tree, that is, back to the table operation.

This is the approximate coverage index!

select ID from user where k between 3 and 7
Copy the code

Core: Overwrite indexes can be perfect for back table operations. The query performance can be significantly improved. So overwriting index is often one of the more practical means of performance optimization.

Left-most prefix index

So this is what you call the leftmost prefix rule. So what is it?

This is a B+ tree index structure that uses the left-most prefix principle to locate records.

Here we take an example of a joint index, where performance and age are both indexes. If we want to query a person with the name of Joe, then we can quickly locate ID4 and all the following data. If you want to check the name is a person, then we can also quickly locate all data after ID3. That’s the benefit of the left-most prefix rule. It can reduce a lot of unnecessary indexes.

An index pushdown

Above we introduced the left-most prefix index, where the index below the tweet is closely related to the text. The concept of index push-down was introduced in MySQL5.6 as an optimization within queries.

Let me give you an example. We want to query the tuser table which is the surname of zhang, and the age is 20 years old, has not been deleted.

Select * from tuser where name like '%' and age=20 and isdelete=1;Copy the code

Prior to 5.6, the name tree will be used to query the surname Zhang. Then the data query starts to determine whether the current data meets the following conditions, whether age is equal to 20, and whether it is in the undeleted state. This is the only way to complete the SQL. So if age is equal to 20, we need to go back to the table. The age of the primary key index is queried. The age of the primary key index is queried. This data is also a perfect match. Then move on to the next operation.

After 5.6, the concept of index push-down was introduced. It takes precedence to determine whether the current record meets age=20 and has not been deleted. Check the result and then back to the table operation. In this way, The Times of returning to the table are reduced and the query efficiency is optimized.