MySQL 【 enjoy 】 series: Data structure of MySQL index, index types, clustered index and non-clustered index

I’m Little Xia Lufei. Blog is not only a note, but also a kind of thinking and sharing.

preface

MySQL index is a key idea of MySQL, which is very important for improving SQL performance and is something every backend developer should be familiar with.

A MySQL index is essentially a data structure. MySQL is a form of organizing data to improve query efficiency.

The article directories

- preface

1 Data structure of the MySQL index

There are actually four types of MySQL indexes: B+ tree index, hash index, inverted index, and R tree index (here for InnoDB engine). B+ tree index is the most commonly used data type, and it is also the focus of the development and interview. The other three indexes are only for understanding.

1.1 B + Tree index

B+ tree index is the focus of this article. B = Balance, index B+ treeMultipath balanced search treeNot a binary tree!

B+TreeIs in theB-TreeBased on an optimization, make it more suitable for the implementation of external storage index structure. In a B+Tree, all data record nodes are stored on the leaf nodes at the same layer according to the order of key values, instead of the non-leaf nodes only storing key values. This greatly increases the number of key values stored on each node and reduces the height of the B+Tree.

1.2 Hash index

The InnoDB storage engine’s hash index is adaptive, that is, it automatically generates hash indexes for tables based on their usage (if it is observed that creating hash indexes can improve the speed), and it only creates hash indexes for certain pages. There is no human intervention on whether a table is generated or not.

1.3 Inverted index

Supported as of MySQL5.7.6. Is the underlying data structure of the search engine. Fast query is realized through word segmentation of content. It is not necessary to delve too deeply into the index from MySQL’s perspective, but you must understand the index structure thoroughly if you specialize in Search engines such as Lucene or Elastic Search.

1.4 R tree Index

Rarely used in MySQL, only supports the geometry data type, has a big advantage is the range query, this type of shaoxia has never been used.

2 Types of MySQL indexes

2.1 Common Indexes

Is the most commonly used index type, with no restrictions, just to speed up queries.

2.2 Unique Index

Expedited query, column value limit unique, can be NULL but null value can occur only once.

2.3 Primary Key Index

Accelerated queries with unique and non-null column values. It is usually an incremented number, but it can also be a UUID but is strongly discouraged.

2.4 Joint Index

An index composed of multiple columns is designed for combined queries and is more efficient than index merges. Caution When querying composite indexes, the left-most prefix must be matched.

2.5 Full-text Index

Based on the data type of inverted index, the text content is segmented and searched.

3 Clustered index and non-clustered index

3.1 Cluster index

As we know, for InnoDB storage engine, table data is stored in the order of primary key, and clustered index is to construct a B+ tree according to the primary key of each table, and the leaf node stores the row record data of the whole table. You can only have one clustered index per table, and MySQL’s query optimizer prefers clustered indexes because clustered indexes allow you to find row data directly on the leaf node of the index.

3.2 Non-clustered index

A non-clustered index is also called a secondary index, and the leaf node does not contain all the data for the row. In addition to containing key values, the index row of each leaf contains a bookmark, which is the clustered index key of the corresponding row data and tells the InnoDB engine where to find the row data corresponding to the index.

When querying data through a non-clustered index, InnoDB engine traverses the non-clustered index and obtains the primary key to the clustered index through the leaf’s bookmark, and then finds a complete row record through the clustered index. This process is actually called table-back, so table-back is slower because there is one more index query than clustering.

The relationship between clustered index and non-clustered index is shown in the figure below:

3.3 Differences between clustered index and non-clustered index queries

According to the above analysis, since the leaf node of the clustered index tree is directly the whole row of data we want to query. The leaf node of the non-clustered index is the value of the primary key. After finding the value of the primary key, we need to query again by the value of the primary key (this process is called back table). Because of the extra operation back to the table, the query efficiency through non-clustered indexes is often lower than that of clustered indexes.

But is a non-clustered index query necessarily slower than a clustered index?

In fact, MySQL also has a technique called covering index, that is, the execution of the Extra column is often a value: covering index, which means that the execution of a query can only be obtained from the index, do not have to read from the table. You can also call it index coverage. When a query statement meets the conditions of overwriting an index, MySQL can return the data required by the query only through the index. In this way, the operation of returning to the table after the index is queried is avoided, which reduces I/O and improves efficiency. For example, the table covering_index_SAMPLE has a common index IDx_KEY1_KEY2 (key1,key2). Select key2 from covering_index_SAMPLE where key1 = ‘keytest’; Can be queried by overwriting the index without returning to the table.

Link: juejin. Im /post/684490… The copyright belongs to the author. Commercial reprint please contact the author for authorization, non-commercial reprint please indicate the source.

4 use of B+ tree indexes

4.1 When is it appropriate to build an index

First, it’s important to be clear that not all scenarios need to create an index for every field. My experience is that using a B+ tree index makes sense when accessing a small number of rows in a table.

When the number of rows scanned exceeds 20% of the total table records, the index will fail and degenerate into full table scans. This is the choice of the MySQL optimizer, not the problem of SQL writing.

For example, if there is a gender field in the user table, there is basically only ‘M’ and ‘F’ to choose from, and the ratio is almost 1:1 (not to mention the social problem of gender imbalance), this field is very selective, so what’s the point of creating an index, which is basically scanning half the table at a time? For the user name, the probability of repetition is small, that is, it is highly selective, and it is suitable for indexing.

4.2 Sequential read, Random read, and Prefetch

As mentioned above, indexing is appropriate when field values are highly selective and only a small portion of the table is retrieved per query. But what is the root cause? It starts with the concepts of sequential reading and random reading… Sequential reads refer to sequential reads of blocks on a disk. In the database, it refers to the sequential reading of the required row data according to the leaf data of the index (B+ tree has a characteristic: the data of the leaf node is sequentially related in the form of a linked list). Sequential reads are actually logical sequential reads and may be random reads on physical disks. Random read refers to access blocks that are not contiguous and require continuous movement of the disk’s magnetic head. Random read generally means that when accessing non-clustered index nodes, complete results cannot be obtained and the actual row data needs to be searched in the clustered index according to the primary key in the non-clustered index leaf node. The performance of random reads is much lower than that of sequential reads. Therefore, when a SQL statement is queried through a non-clustered index, if the estimated number of rows reaches a certain percentage, the MySQL optimizer will select the full table scan mode rather than the non-clustered index scan mode. Of course, in order to improve the performance of MySQL reading, InnoDB engine introduced prefetch technology. Prefetch is to prefetch multiple pages into the buffer pool through one IO request and anticipate that the pages will be accessed immediately.

4.3 Index push down

Index Condition Pushdown is an Index optimization introduced in MySQL 5.6. Examples and explanations given in the official documentation are as follows: In the people table (zipcode, lastname, SELECT * FROM people WHERE zipcode= ‘95054’ AND lastname LIKE ‘%etrunia%’ AND address LIKE ‘%Main Street % ‘; If index push-down is not used, MySQL queries the storage engine with zipcode=’95054 ‘and returns the data to the MySQL server. The MySQL server then determines whether the data meets the criteria based on lastName LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’. If index push-down is used, MYSQL will first return an index that matches zipcode=’95054 ‘and then determine if the index is eligible based on lastName LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’. If yes, the data is located based on the index. If no, reject the data. With index push-down optimization, you can reduce the number of times back to the table in the case of like conditional queries.

summary

MySQL index is a technical point that must be confronted in development and interview. Proficiency is necessary.