MySQL index related content has always been a headache, especially for beginners. For a long time, I was stuck in the middle of “overwrite, secondary, unique, Hash, B-tree…” What are some of the things that lead to an awkward situation during an interview?

Many people may complain that “interview makes rocket, work screws, a lot of knowledge is for the interview to learn, do not use in the job! “. Fortunately, indexing in MySQL is not only required for interviews, but also the most frequently used skill in the workplace. In my opinion, indexing is the most cost-effective part of MySQL.

Since MySQL supports a variety of storage engines, there is a slight variation in the implementation of different storage engines. Indexes in the following sections refer to InnoDB storage engines by default unless specified otherwise.

First, the underlying data structure

First, an index is a data structure that efficiently retrieves data. Just like a table of contents in a book, we can use it to quickly locate the location of data, thus improving the efficiency of data query.

There are many nouns and concepts about indexes in MySQL that can be confusing for beginners. To make it easier to understand, I’ve created a table to try to clarify what these concepts are from specific cases.

A Hash index

As mentioned above, index is a data structure that improves query efficiency, and there are many data structures that can improve query efficiency, such as binary search Tree, red-black Tree, jump table, Hash table, etc. MySQL uses B+Tree and Hash table as the underlying data structure of the index.

Note that MySQL does not explicitly support Hash indexes, but rather as an internal optimization, automatically generates Hash indexes for hot data, also known as adaptive Hash indexes.

Hash index In equivalent query, it can locate data in O (1) time complexity, which is highly efficient. However, it does not support range query. This data structure is used in many programming languages and databases, such as the Hash data structure supported by Redis. The specific structure is as follows:

B + Tree index

B-tree (multi-path search Tree, not binary) is a common data structure. The use of B-tree structure can significantly reduce the intermediate process of locating records, thus speeding up access speed.

The B+ Tree is a Tree data structure based on the UPGRADED B-tree. It is commonly used in databases and file systems of operating systems. The characteristic of B+ tree is that it can keep the data stable and orderly, and its insertion and modification have relatively stable logarithmic time complexity. B+ tree elements are inserted from the bottom up, as opposed to binary trees.

MySQL indexes are also implemented based on this efficient data structure. The specific data structure is as follows:

We should not confuse b-tree, B-tree and B+Tree. The “-” in the middle is a hyphen, not a minus sign. There is no “B minus Tree “data structure. There are two differences between B+Tree and B-tree

1 B+Tree stores data only on leaf nodes, while B-Tree stores data on each node

② All data can be obtained by traversing leaf nodes of B+Tree through pointer links.

B+Tree is a magical data structure, which may be a little difficult to use in language. If you are interested, you can click on the data structure visualization tool at the end of the article, and you will gain something after operation. The following figure is the author’s demonstration of B+Tree data insertion method (bottom-up).

Second, data organization

According to the organization of data, it can be divided into clustered index and non-clustered index (also called clustered index and non-clustered index). Clustered index is to construct a B+Tree according to the primary key of each table, and the leaf node stores the row record data of the whole table.

In InnoDB, a clustered index is equivalent to a primary key index. In MySQL, each table must have a primary key index. The primary key index can only have one, cannot be null and must be unique. If no primary key index is specified during table creation, a hidden field is automatically generated as the primary key index.

The corresponding index is non-clustered index, which can also be called non-primary key index, secondary index, secondary index. Primary key index of the leaf node to store a complete data lines, rather than the primary key index of leaf node value, is the primary key index of the storage by the primary key index data query, the first lookup to the primary key index, and then to find the corresponding data of the primary key index went up, and this process is called back to the table again mentioned in (below).

It should be added that indexes and data files are stored separately in MyISAM, and all indexes are non-clustered indexes. The leaf nodes of a B+Tree store the location of the data, not the specific data.

Three, contains the number of fields

To meet different data retrieval requirements, indexes can contain only one field or multiple fields at the same time. An index consisting of a single field may be called a single-valued index, otherwise it is called a composite index (or a composite index or a multi-valued index). All of the previous examples were single-valued indexes, so I’ll show you a composite index for comparison.

The data order of a composite index is related to the order of the fields. An index with multiple values will be sorted by the values of the following fields if the values of the previous fields are the same.

Fourth, other categories

The only index

Unique index, which does not allow rows with the same index value, thereby disallowing duplicate indexes or key values. The system checks for duplicate key values when creating the index and every time data is added using INSERT or UPDATE statements. If duplicate values exist, the operation fails and an exception is thrown.

Note that a primary key index must be a unique index, and a unique index is not necessarily a primary key index. A unique index can be understood as simply setting a unique attribute to the index.

Cover index

As mentioned above, when data is queried through a non-primary key index, the value of the primary key index will be queried first, and then the specific data in the primary key index will be queried. The whole query process needs to scan the index twice. Obviously, table back is a time-consuming operation.

In order to reduce the number of times back to the table, we can design the index to contain the results of the query, and directly return the data retrieved from the secondary index, without the need for back to the table operation.

However, it is important to note that an overwrite index can be used only when the field length is short. An overwrite index is not suitable for a field with a long value length. For example, an index is usually stored in memory. There are other reasons, too, which will be covered in the next article.

Six,

This paper introduces indexes in MySQL from different dimensions. Indexes can be divided into many names from different dimensions. However, it needs to be clear that the essence of indexes is a data structure, while other indexes are divided for practical applications. The specific classification is shown in the figure below:

The purpose is to give you a preliminary and clear understanding of the index and solve the problem of What. Why and How will be discussed in depth in the follow-up. Of course, first of all, we should be able to distinguish the conceptual issues described in this chapter and article.

Data structure visualization tool: https://www.cs.usfca.edu/~galles/visualization/Algorithms.html

Seven, Q&A

1. Why MySQL index is implemented using B+Tree instead of searching binary Tree, red black Tree or jump table?

This is a comprehensive question, and it’s not as simple as it looks. You can leave the answer in the comments section and we will discuss why and how to use indexes correctly in the next article.