| author: zhi-chao ma, tencent cloud database development engineer.

MySQL index classification problem is always a pain, almost all the data will give you a long list of primary key index, single value index, overwrite index, adaptive hash index, full text index, clustered index, non-clustered index… It feels like there are a lot of different ways to implement MySQL indexes, but none of them have a clear taxonomy. InnoDB storage engine InnoDB storage engine InnoDB storage engine InnoDB storage engine InnoDB storage engine InnoDB storage engine InnoDB

Indexes can be divided into different dimensions by many different names, but one thing needs to be clarified – indexes are essentially data structures, and other indexes are divided for practical purposes.

I. Partition according to the underlying data structure

Index is to improve the query efficiency of data structure, data structure and can improve query efficiency has a lot of, such as binary search Tree, red and black Tree, jump table, Hash table (Hash) and so on, and using the MySQL B + Tree and Hash (Hash) as the index of the underlying data structures (actually used to also jump table implementation full-text index, but it’s not important to the test, So it can be ignored).

1. A hash index

MySQL does not explicitly support Hash indexes, but rather as an internal optimization. Specifically, Innodb storage engine monitors the search of secondary indexes on tables. If a secondary index is frequently accessed and becomes hot data, a hash index is created for it. Therefore, in MySQL’s Innodb, Hash indexes are automatically generated for hot data. This kind of hash index is also called adaptive hash index, depending on the characteristics of the scenario used.

2. B+ tree index

This is the basic implementation of MySQL indexes. In addition to the full text index, hash index, Innodb, MyISAM index through B+ tree implementation.

Second, according to the number of index fields

To meet different data retrieval requirements, indexes can contain only one field or multiple fields simultaneously. An index consisting of a single field can be called a single-valued index, otherwise it is called a composite index, also known as a composite index or a multi-valued index.

This is easy to understand, if we have a table with three attributes, id, age, and name. If an index is created on an ID, it is a single-valued index; If an index is created on name and age, it is a composite index.

The data order of a composite index is related to the order of the fields. An index with multiple values will be sorted by the values of the following fields if the values of the previous fields are the same.

An overwrite index can be used only when the field length is short. An overwrite index is not suitable for a field with a long value length. For example, an index is stored in memory.

The index is divided according to whether the index is created on the primary key

1. Primary key index

MySQL organizes data by primary key, so each table must have a primary key index. The primary key index cannot be null and must be unique. If no primary key index is specified during table creation, a hidden field is automatically generated as the primary key index.


2. Secondary indexes

If it is not a primary key index, it can be called a non-primary key index, a secondary index or secondary index. The leaf node of the primary key index stores the complete data row, while the leaf node of the non-primary key index stores the primary key index value. When querying data through the non-primary key index, the primary key index will be searched first, and then the corresponding data will be searched through the primary key index.

Here we assume we have a table user with three columns: ID, age, name, create_time, ID is the primary key, (age, create_time, name) create secondary index. Execute the following SQL statement:

Select name from user where age>2 order by create_time desc.

Normally, the query is divided into two steps:

1. Find the primary key of the record according to the secondary index.

2. Search for records in the primary key index and return name.

Age, create_time, name (age, create_time, name), age, create_time, name (age, create_time, name), name (age, create_time, name)

With this in mind, Innodb has been optimized for query scenarios that use secondary indexes, called overwrite indexes.

4. Data is divided according to storage association between data and index

According to the storage correlation between data and index, it can be divided into clustered index and non-clustered index (also called clustered index and non-clustered index). A cluster index, also known as a cluster index, is a way of reorganizing the actual data on disk to sort it by the specified value of one or more columns. In short, the difference between the two is whether the order in which the index is stored and the order in which the data is stored is relational. Relevant is clustered index, irrelevant is non-clustered index. The implementation varies depending on the data structure of the index. The following uses indexes implemented by B+ trees as examples to illustrate clustered indexes and non-clustered indexes.

1. Cluster index

Innodb’s primary key index, non-leaf nodes store index Pointers, and leaf nodes store both indexes and data, which are typical clustered indexes (it can be found here that the storage order of indexes and data is strongly correlated). Therefore, it is a typical clustering index), as shown in the figure:




2. Non-clustered index

In MyISAM, indexes and data files are stored separately. Leaf nodes of B+Tree store the address of data storage instead of specific data, which is a typical non-clustered index. In other words, data can be stored anywhere on disk, and indexes can be stored anywhere on disk, as long as the leaf node records the correct location of the data. Therefore, the index storage order has nothing to do with the data storage relationship, is a typical non-clustered index, in addition, Inndob secondary index is also non-clustered index.

V. Other categories

1. Unique index

As the name implies, it is not allowed to have rows with the same index value, thus disallowing duplicate indexes or key values. The system checks for duplicate key values when creating the index and every time data is added using INSERT or UPDATE statements. If duplicate values exist, the operation fails and an exception is thrown.

Note that a primary key index must be a unique index, and a unique index is not necessarily a primary key index. A unique index can be understood as simply setting a unique attribute to the index.

2. Full-text index

Prior to MySQL 5.6, only the MyISAM storage engine supported the full-text engine. In version 5.6,InnoDB added support for full-text indexing, but not for Chinese full-text indexing. In version 5.7.6,MySQL has a built-in Ngram full-text parser to support word segmentation for Asian languages. It is mainly used to query text using keywords, but it is not the main context-oriented feature of MySQL, so it is rarely used and will not be discussed here.

Six, summarized

Finally, summarize a brain map to facilitate memory:

Phase to recommend

How does the senior DBA of goose factory do data sorting?

Preferential experience cloud database

Click to enjoy Tencent Cloud MySQL database preferential activities