There’s something you need to know about Mysql indexes

mapping

The index model

Hash

Introduction to the

Hash indexes are the default index structure used by MEMORY engines, and only MEMORY engines can explicitly use Hash indexes, which are actually HashTable,key-value structures. It also uses the zipper method to resolve hash conflicts, which is not discussed here.

structure

Note that the Value of a HashTable does not store a single row of data but rather a pointer.

The characteristics of

  1. There is no index field information in the hash index, so there is no index value to reduce the number of rows read
  2. Hash indexes are not ordered, so they cannot be sorted
  3. Hashes do not support index matching look-ups, for example, b+ tree indexes can use the leftmost matching principle to use union indexes, but hash indexes cannot. That is, create A hash index on the data column (A, B). If the query has only data column A, the index cannot be used.
  4. Hash index retrieval efficiency is very high, basically a search can find the data, from the search times than B+ tree index is much less
  5. Only equivalent query is supported. Range query is not supported
  6. Indexes can be expensive to maintain when there are too many hash collisions

B tree

Introduction to the

If we want to understand B+ tree, we must first understand B tree, B tree is a balanced search tree designed for secondary storage devices such as disks, its nodes can store multiple keywords, and it is a multi-fork tree, a B tree meets the following conditions:

To explain a few words:

  • Keyword: Data stored on each node
  • X: number of keywords stored on each node
  • M: The height of the tree

Definition:

  1. It’s a tree with a root
  2. Each node contains x+1 Pointers to the child node
  3. The keywords stored on each node are in non-descending order, and all data in the left subtree of each data is smaller than it, while all data in the right subtree is larger than it.
  4. Each leaf node has the same depth, that is, the height of the tree, or it can be said that the leaves are all at the same layer
  5. Each node has at most M-1 data
  6. The root node can have at least 1 data

The data structure

This is a third order B tree

Insert the delete

B tree insertion mainly involves three steps: query, insert and split.

  • Query: This refers to finding the insertion location. The search here is similar to the search in a binary search tree
  • Insert: Keyword insert process
  • Split: In order to maintain the structure of B-tree, when the number of node keywords reaches the threshold, the split action is generated. The intermediate keywords are promoted to the parent node, and other nodes generate two new subtrees

Deletion of a B-tree is very similar to insertion, only a little more complicated, because deletion can be removed from any node, not just the leaf node. So it involves preserving the structure definition after deletion. The specific process of deleting will not be described here, which is more complicated. I encourage you to look it up in Introduction to Algorithms on page 286.

animation

Max-degree = 3; max-degree = 3; max-degree = 3; max-degree = 3

  1. When we insert 1, the tree is empty, so we just insert.
  2. Insert 3 after the same node 1 since 3 is greater than 1 and the current node keyword is less than 3.
  3. Insert 5,5 is greater than 3, so after 3, the keyword of the current node =3 triggers splitting, the middle keyword 3 goes up to the root node, and 1,5 generates two new subtrees
  4. Insert 4,4 is greater than the root node 3, so go to the right, less than 5, and the key word number of the node is less than 3
  5. Delete 1, find the position of 1, delete, find the left subtree is empty, does not meet the definition, 004 keyword rises to become the root node, 003,005 are left and right nodes respectively.

If we insert a 1, we see that the structure doesn’t go back to what it was before.

The query

Searching a B tree is similar to searching a binary search tree. The difference is that the branch selection of each node is multi-fork, not necessarily binary.

Why is b-tree (n-fork) good for indexing

In order to explore this question we should first know a few things

  • Index storage

An index is a data structure that needs to be stored, not just in memory, but on disk.

  • Factors affecting database query speed

As I mentioned in my previous blog post, database queries are actually made to main storage, and if they don’t exist, they are then made to disk and brought to main storage. Disk I/O will be involved in the process of disk query. In fact, if you do not consider the query from main memory, the most affecting database query speed is disk I/O. Therefore, minimizing disk I/O can significantly improve the data query speed.

  • Proofread the mechanism

Traditional disk reading relies on mechanical movement, and IO time is divided into three parts: seek time, rotation delay and transmission time. ** SSDS are different. ** In order to reduce IO consumption, use the prefetch mechanism. When an address is accessed, adjacent data is quickly accessed as well. Each disk I/O reads a page of data. The size of a page is typically 4K or 8K, depending on the operating system. This means that when reading data on a page, disk I/O is actually happening.

Through the above points, we can know that the index will write disk, the process of querying the index is the process of querying the node. If it is not in memory, it will go to the disk and generate disk IO. Then the speed of querying the index mostly depends on the data page that needs to be accessed each time. We can assume a 1 million-node balanced binary tree with a height of 20. A query may require access to 20 data blocks. And if we use b trees it depends on the size of N in our N sub tree. And, of course, the same thing is true for B+ trees.

B + tree

Introduction to the

In fact, B+ tree is the evolution of B tree and sequential index access method, here we will not repeat the difference between B tree and B+ tree, as long as we know that each node of B tree stores key and data, data of B+ tree is stored in the leaf node.

structure

A B+ tree is a variant of a B tree, very similar to a B tree. This is a third order B+ tree.

Insert and delete

From the picture we can see and know, in fact the process of insertion and deletion of B + tree and binary tree, the only difference is that B + tree leaf nodes through a two-way chain table link, at the time of maintenance tree also need to maintain the two-way linked list, here is not to say carefully inserted delete algorithm in detail. For details, see Mysql Technology Insider InnoDB Storage Engine P202

The query

The query process is similar, except that the nodes in the B+ tree do not store data, so they are actually looking up in the linked list.

The difference between

  1. The B+ tree improves the B tree by making non-leaf nodes only used as indexes and removing the pointer to data record, so that more keys can be stored in each node and therefore a larger number of fans can be generated.
  2. Leaf nodes are connected in a linked list form.

Non-leaf nodes don’t store data, which means they hold the same number of keys, and the height of the tree can be further compressed, making retrieval times shorter. In addition, the leaf node is in the form of linked list, which can carry out sequential traversal faster.

Clustered Index

A leaf node stores an entire row of data, and the data is sorted by the primary key index. There can only be one primary key index in a table because the data can only be sorted by one index. Cluster indexes carry a uniqueness constraint.

Non-clustered secondary index

Also known as secondary indexes, leaf nodes do not store the entire row of data, but the primary key value. You need to go back to the table to find the complete data.

The only index

It is used to ensure that no two rows in a table have exactly the same key value and is generally used to help maintain data integrity. Note also that the uniqueness constraint does not equal a unique index.

Nonunique index

Normal indexes, used to aid queries, do not have unique constraints.

Page divided

The index increases sequentially (not necessarily consecutively, just incrementally), and the index is compact; Random insertion of rows in order to ensure index order may cause index data pages to split. The splitting of the data page creates a data void. The splitting process affects not only performance but also space utilization, because the data from one data page is split into two data pages. Of course there is fragmentation and there is consolidation. When two adjacent pages have low utilization due to data deletion, the data pages are merged. The process of merging can be regarded as the reverse of the process of splitting.

Here’s a picture from Mysql Lecture 45

Back to the table

The process of querying the non-clustered index to the primary key and returning to the clustered index search is called back table.

Cover index

An overwrite index is not a type of index, but a result that can be queried from a non-clustered index and then returned to the table. The value of a non-clustered index is much smaller than the value of a clustered index, so it can reduce a lot of IO operations, so we often say do not use SELECT *.

Joint index

A joint index is an index of multiple columns in a table. A joint index is also a B+ tree. The difference is that the number of keys in a joint index is not 1, but greater than or equal to 2. The important thing to note here is the structure of the federated indexAll index columns of the union index appear on the index tree and are sorted by comparing the size of the three columns.

The author is of limited level, please point out any errors and omissions.

Refer to the article

1. Data structure and algorithm principle behind MySQL index

2.stackoverflow

3. Understand in depth why database indexes use B trees and B+ trees

MySQL > select * from ‘MySQL’

InnoDB Storage engine version 2