Database index chatter

preface

In our daily work and study, there are many scenarios using the index, such as slow query and so on, but what is the index? How do I use indexes? How do I hit an index? This article will take you through a series of questions about indexing and hit ratio.

What is an index

Concept: An index is an ordered data structure that helps a database efficiently retrieve data.

Conceptually, the essence of database indexes is data structures. (important)

Indexes are table-level, not library-level, and the simplest example would be to specify a storage engine when creating a table.

The index classification

Common MySQL index is generally divided into Hash index and B+ Tree index. InnoDB engine, the default is B+ Tree.

We can look at hash indexes and B+ numbers:

B + tree:

A B+ tree is a balanced multi-fork tree with a height difference of no more than 1 from the root node to each leaf node, and Pointers linking nodes of the same level.

In the conventional search of B+ tree, the search efficiency from root node to leaf node is basically the same, without large fluctuations. In the sequence scan based on index, the bidirectional pointer can also be used to move left and right quickly, which is very high efficiency. Therefore, B+ tree index is widely used in database, file system and other scenarios.

Hash:

Hash index is to use a certain hash algorithm, the key value into a new hash value, retrieval does not need to be similar to B+ tree from the root node to the leaf node step by step search, just a hash algorithm can immediately locate the corresponding position, very fast.

From the above diagram, we can find that:

Scenarios where hash indexes are not applicable:

(1) Range query is not supported
(2) Index complete sort is not supported
(3) The left-most prefix matching rule of the joint index is not supported

In the equivalent query, hash index has a great advantage, but when there are a large number of duplicate key values, hash collision will occur (after the hash calculation of key values, there will be a large number of duplicate key values, which will pull out the linked list), reducing efficiency.

Therefore, a typical index will use B+ tree data structure.

Interview question: Why do indexes use B+ trees instead of B trees? (spot)

Let’s take a look at two diagrams, namely B tree and B+ tree.

B tree:

B + tree:

Let’s analyze these two trees:

1. Leaf nodes of a B+ tree can access each other, but not B tree.

2. The leaf node of B+ tree saves the upper redundant nodes, while the leaf node of B tree does not.

3. Only leaf nodes of B+ store data, while every node of B tree stores data;

To sum up, we can come to the conclusion that:

1. B+ tree nodes do not store data, so disk pages can store more node elements, which is “fatter” and reduces IO operations;

2, B + tree search must find the leaf node, B tree as long as the match can not care about the element position, so B + tree search is more stable;

3. In range search, B+ numbers can traverse leaf nodes, while B trees need repeated traversal

Note: We can generally think of the height of the tree as the number of IO operations.

Clustered index and non-clustered index

We generally say clustered index and non-clustered index are for InnoDB engine and MyISAM engine;

The biggest difference between clustered index and non-clustered index is whether the leaf node stores the whole row of records. (key)

InnoDB primary key index uses clustered index, MyISAM uses non-clustered index;

MyISAM index

MyISAM index files and data files are separated, the index file only holds the address of the data record (non-clustered index);

Non-clustered indexes store data in index-separated structures, and the leaf nodes of the index structure point to the corresponding rows of data. Myisam uses key_buffer to cache the index in memory first. When data needs to be accessed (through the index to access the data), myISAM directly searches the index in memory, and then finds the corresponding data on disk through the index.

Photo source: Internet

MyISAM index file and data file are separated, the index file only holds the address of the data record;

InnoDB index

In InnoDB, the table data file itself is an index structure organized by B+Tree. The data field of the leaf node of this Tree holds complete data records. The key of this index is the primary key of the table, so the InnoDB table data file itself is the primary index. (Cluster index)

By default, a clustered index is a primary key. If no primary key is defined in the table, InnoDB selects a unique non-empty index instead. If there is no such index, InnoDB implicitly defines a primary key as the cluster index.

Photo source: Internet

Differences between clustered indexes and non-clustered indexes

Here’s a picture :(source: Internet)

1. Non-clustered index table (right figure). Table data and index are stored in two parts, and there is no difference between primary key index and secondary index. B+ tree is used as the storage structure of the index, all nodes are indexes, and leaf nodes store the data corresponding to the index + index.

2. Clustered index table (left figure). Table data is stored together with primary keys. Leaves of primary key indexes store row data (including primary key values), and leaves of secondary indexes store row primary key values. B+ tree is used as the storage structure of the index. Non-leaf nodes are all index keywords, but the keywords in non-leaf nodes do not store the specific content or content address of the corresponding record. The data on the leaf node is the primary key and the specific record (data content). Use a non-primary key index to store the value of the primary key, and then obtain the primary key index through the primary key, clustered index to store row data, and obtain data through the primary key line. The process of going back to the primary key index tree is called back to the table.

instructions

This article refers to many of your works, and finally forms this article, thanks to share the knowledge of the vast number of engineers. If there are mistakes or inadequacies in the article, please give us more advice and hope that we can gain something

Wechat official account: background server development

Database index chatter

Database index chatter

Related Posts

RocketMQ Quick Start

Submodule operations in Git

Question 16 – The sum of the nearest three numbers