Questions raised

When is an index created and when is it not needed?

What are the types of indexes?

What is an index

Index is a data structure to help database management system to obtain data efficiently, just like a book catalog, it can help us to quickly locate and search for specific values, so as to speed up the efficiency of data query.

Types of indexes

Logically divided by function

  • A common index is a basic index that has no constraints and is mainly used to improve query efficiency
  • Unique index is to add the constraint of data uniqueness on the basis of common index, there can be multiple unique indexes in a data table
  • A primary key index is NOT NULL or UNIQUE, so there is only one primary key index in a table
  • Full text index is not used much, MySQL’s own full text index only supports English. Specialized full-text search engines such as ES(ElasticSearch) and Solr can often be used

From the physical implementation of points

  • Clustered index
    • Clustered indexes can store data sorted by primary key, which is very effective when looking up rows
  • Nonclustered index
    • In the database system, there is a separate storage space for non-clustered indexes. These indexes are stored in order, but the contents of the indexes are stored randomly. In other words, the system will perform two lookups, the first one is to find the index, the second one is to find the corresponding position of the index to retrieve the data row, is to maintain a separate index table (only maintain the index, do not maintain the index to the data.
  • The difference between
    • Leaf nodes of clustered indexes store our data records, while leaf nodes of non-clustered indexes store data locations. Non-clustered indexes do not affect the physical storage order of data tables.
    • A table can have only one clustered index, because there can be only one sort of storage, but it can have multiple non-clustered indexes, that is, multiple index directories providing data retrieval.
    • When a clustered index is used, the data query efficiency is high, but when data is inserted, deleted, or updated, the efficiency is lower than that of a non-clustered index

Principles of indexing

Why is the index stored on hard disk

There are two kinds of database server storage medium, hard disk, and memory is stored in the memory in the event of failure such as breakpoints, easy to cause data loss, stored on disk, there will be a lot of IO, we know the disk IO is time consuming, if the index data structure as far as possible to reduce disk IO operations, then time will reduce greatly.

From binary trees to B+ trees

Support to quickly find the data structure of a jump table, hash table, binary tree search tree, jump lookup table support interval, the hash table doesn’t support range query, binary tree search tree does not support according to the range query quickly, but the binary tree search tree evolve and meet the requirements of the index of data structure, take a look at the binary search to the evolution of B + tree.

Binary search tree is a very large binary tree, each node’s left child is smaller than the parent node, the right child is larger than the parent node, the time complexity of finding a grounding is O(log2n).

IShot 22.55.45 2020-03-05. PNG

But as you keep adding nodes to the tree, you might have a situation where a path grows, and eventually the binary tree degenerates into a linked list with O(n) time.

If it makes the height difference between left and right subtree is not big, still can continue to maintain the characteristics of the binary search tree, and the balanced binary tree this kind of structure was proposed, he let each node of the left and right subtrees height difference is no more than 1, it belongs to the strict balance, such as avl tree, but the strict balanced tree, maintain the height difference to design the complex algorithm to achieve, Time cost will also increase, and later Daniel proposed that we do not let him strictly balance, the height difference is not too large, although it will lose a little query speed, but the complexity of the tree is greatly reduced, the query efficiency can meet the requirements, this kind of tree is called red black tree.

The time of data query mainly depends on the number of disk I/O. If we adopt the form of binary tree, even if we improve it by balancing binary search tree, the tree depth is O(log2n). When N is large, the depth is high.

This time Daniel again, it should be a multi-fork Tree, multi-fork Tree can reduce the height, so you can reduce the number of DISK I/O, give this Tree a name, it is called multi-fork Balance Tree, Balance Tree. How many crosses is that? That’s based on the page size.

A Balance Tree is also a B Tree. The nodes of the B Tree can store data, which leads to unstable query efficiency. Sometimes a keyword can be found when a non-leaf node is accessed, and sometimes a keyword can be found when a leaf node is accessed.

At this time, B+ tree was put forward. The non-leaf node of B+ tree only stores indexes but not data, while the leaf node only stores data records. The leaf node also forms a bidirectional linked list and is linked sequentially from large to small.

You are welcome to check out my blog, where there is more about the actual test content oh!!