MySQL index

1. Basic concepts of indexes

Index is a very important concept in database, then what is the index, popular, the index is the storage engine used to quickly find records of a data structure that is like a book catalog, when you’re looking for a particular row, can quickly locate the location of the information in the index, and can be directly obtained target record.

Since indexes appear to improve search efficiency, there must be different index structures (models), and different index models must have their own scenarios to adapt to. In the following articles, we will focus on the common index models and their characteristics.

2. Common index models

There are many data structures used to improve read and write efficiency. The most common ones are hash tables, arrays and search trees.

2.1 Hash index

Hash table is a data structure stored in key-value pairs. It searches for values based on keys. Only Memory engine supports hash index, which converts column values into physical locations based on hash functions and stores values in this array.

However, multiple keys may have the same position after calculation, which is called hash conflict. We can solve this problem by using chain address method. The idea is to pull out a linked list from the conflicting array index position

Hash indexes use hash algorithm, so the storage location is scattered, so the index is suitable for equivalent query scenarios, not for range query.

2.2 Ordered array indexes

The idea of an array index is very simple, it can quickly find the stored elements according to the location of the index, and because the array is ordered, so the index is good in range queries.

If it is only a query, using an index of an ordered array is certainly appropriate, but when inserting data into an array, it is inefficient to move the elements after the inserted position, so an ordered array index is only suitable for query operations.

2.3 the index tree

Tree is also a kind of data structure, which combines the characteristics of array and linked list, and ensures the speed of data insertion, deletion and modification as well as the speed of retrieval.

Let’s take a binary search tree, let’s look at the binary search tree structure

The binary search tree defines that in non-leaf nodes, the value of the left child is less than the value of the node, and the value of the right child is greater than or equal to the value of the node. Such a binary tree is called a binary sorting tree. Let’s say we want to find 2, which takes the path 7–>3–>1–>2. The time complexity of the query is O(log(N)), but the time complexity depends on the tree being a balanced binary tree.

Trees can have multiple forks, that is, each node in a multi-fork tree has multiple children, and the data size of the children must be increasing from left to right. Binary trees are the most efficient search, but they are not suitable for a database, because indexes are not only stored in memory, but also written to disk. In order to minimize the number of disk reads for a query, it is necessary to allow the query to access as few blocks of data as possible. Therefore, multi-tree is the most used.

3. Analysis of InnoDB index model

Through the above content, we know that multi-fork tree has been widely used in database engines due to the advantages of read and write performance and disk access mode. In fact, InnoDB uses the tree model is B+ tree. Database tables are stored in the form of indexes according to the order of primary keys. Each index corresponds to a B+ tree.

3.1 B index tree

B tree is a multi-fork self-balanced search tree, which differs from ordinary binary tree in that each node is allowed to have more children. Its design idea is to concentrate more relevant data together as far as possible, so as to read multiple data at a time, reduce the number of disk operations, it can maximize the optimization of large data read and write operations, accelerate the access speed.

B tree has the following characteristics:

  • Keywords are distributed throughout the entire tree
  • Any keyword appears on only one node in the tree
  • Search efficiency is equivalent to binary search

Its advantage is that it can store keys and values in internal nodes at the same time. Therefore, storing frequently accessed data close to the root node will improve the query efficiency of hotspot data. This feature makes B-tree very efficient in the scenario where specific data is repeatedly queried.

3.2 B + tree index

B+ tree is a further optimization of B tree, B+ tree diagram is shown below

B+ tree characteristics

  • The internal nodes of the B+ tree store only keys, not values, so you can get more keys in the memory page, which helps to narrow down the query more quickly
  • The leaf nodes of B+ tree are connected by a chain. Therefore, when all data traversal is needed, it only takes O(logN) time for B+ tree to find the smallest node, and then perform O(N) sequential traversal through the chain. B trees, on the other hand, need to traverse each level of the tree, which requires more memory replacement times and therefore more time

3.3 contrast

The index used in InnoDB is B+ tree, so why is B+ tree better than B tree for database index

  • B+ trees have lower disk read and write costs

No internal nodes in the B + tree to store data, so the internal node is small compared with the B tree, if you put all the keywords of the same internal nodes are kept in the same disk block, the disk blocks accommodate will, the more the number of keyword query data read into memory when the more keyword opportunities, to read and write IO will reduce the number of times

  • B+ tree query efficiency is more stable

The non-leaf nodes of the B+ tree are not nodes that point to the contents of the file, but are only key indexes in the leaf nodes. So any keyword lookup must take a path from root to leaf. The length of all keyword query paths is the same, resulting in the same query efficiency of each data.

  • B+ trees implement range queries

B tree improves IO performance and does not solve the problem of low efficiency of element traversal. It is in order to solve this problem that B+ tree application was born. B+ trees can traverse the entire tree simply by traversing the leaves. Moreover, range-based queries are very frequent in the database, and b-trees do not support such operations or are inefficient.

4. Classification of indexes

In a B+ tree, index types can be divided into primary and non-primary key indexes based on the content of leaf nodes

  • The primary key index

In InnoDB, the primary key index is also called a clustered index. Its index and data are stored in the same. Idb file, so its index structure is to store both the index and data in the same tree node

  • Non-primary key index (plain index)

Leaf node contents that are not primary key indexes are primary key values. In InnoDB, non-primary key indexes are also called secondary indexes.

Now that we know about primary and normal indexes, what is the difference between them?

  • Select * from T where ID=XXX; select * from T where ID=XXX; select * from T where ID=XXX;
  • If the statement is select * from T where k=XXX, that is, the common index query mode, search the K index tree to obtain the ID value, and then search the ID index tree. This process is called table back.

That is, a query based on a non-primary key index requires one more index tree to be scanned. Therefore, we should try to use primary key queries in our applications.

Is there a case where you can just use a plain index and not need to go back to the table to get the data you need? The answer is yes, overwriting an index will do the trick.

  • Cover index

An overwrite index is a select column that can be retrieved only from the index, not from the table. In other words, the query column is overwritten by the index used. That is, a normal index can hold data in addition to the ID to which it points.

Because overwriting indexes can reduce the number of tree searches and significantly improve query performance, all overwriting indexes are a common performance optimization tool.

  • Joint index

A combined index is also known as a composite index. For this type of index, MySQL uses the fields in the index from left to right.

For example, key index(a,b, C) supports the search for the combination of indexes (a), (a,b), and (a,b, C), but does not support the search for indexes B and C.

You can narrow the search with additional columns in the matching index, but using one index with two columns is different from using two separate indexes. The structure of a composite index is similar to that of a phone book, where people’s names are made up of first and last names, and the phone book sorts people with the same last name by first name. If you know a last name, the phone book is very useful; A phone book is more useful if you know first and last names, but useless if you only know first and last names.

  • The only index

A unique index is an element in an index column that is not repeatable and can only appear once, which is different from a normal index

Query operation

Normal index: After the first record that meets the condition is found, the next record is searched until the first record that does not meet the condition is found

Unique index: Since an index defines uniqueness, the search is stopped after the first record that satisfies the condition is found

Therefore, a normal index has more operations than a unique index. However, the difference in performance between them is very small

The update operation

Normal indexes: When a data page needs to be updated, InnoDB will update the data directly if the data page is in memory. If the data page is not already in memory, InnoDB will cache these updates in the Change Buffer without affecting the consistency of the data, so that the data page does not need to be read from disk. The next time a query needs to access the data page, the data page is read into memory and the change Buffer operations related to the page are performed. In this way, the correctness of the data logic can be guaranteed.

Unique index: All update operations must determine whether the operation violates the uniqueness constraint, which must be determined by reading the data page into memory. If it is already read into memory, it is faster to update the memory directly, and there is no need to use change buffer.

conclusion

In the query operation, since the two are not so different, you can choose either one. During update operations, unique indexes consume more I/O resources than common indexes, so common indexes are selected. For comprehensive analysis, choose normal indexes.

5. Index push down

Simply put, index push-down is an optimization to reduce the number of times a database retrieves data from a table.

case

There is a joint index (name,age), now the requirement is to retrieve the first word of the name in the table is lang, and the age is 23 years old boy, SQL can write like this

mysql> select * from tuser where name like Lang '%' and age=10 and ismale=1;
Copy the code

MySQL5.6 before,

No index push down

Under the premise of the left-most prefix, only the left-most index “Lang” can be used to find the first ID that meets the condition, and then go back to the table to find the data row on the primary key index, and then compare the field value.

There’s an index push down

With index push-down, you can judge the fields in the index and directly filter out the records that do not meet the conditions during index traversal, reducing the number of back to the table. InnoDB determines whether age equals 10 in the index (name,age) and skips records that do not. In our example, we only need to judge the two records of ID4 and ID5 back to the table to fetch data, and only need to back to the table 2 times.

Refer to the article

[1] Lin Xiaobin. MySQL Combat 45 lecture

[2]https://www.jianshu.com/p/0371c9569736

Concern public number: 10 minutes programming

The public responded to SUCCESS for learning resources