Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

An overview of the

An index is a structure that sorts the values of one or more columns of a database table. Using an index allows quick access to specific information in a database table.

Indexed data structure

Binary tree

A binary tree is an ordered tree whose nodes have a degree of 2 or less. It is the simplest and most important tree. The recursion of binary tree is defined as: binary tree is an empty tree, or a non-empty tree consisting of a root node and two disjoint left and right subtrees called root respectively. Left and right subtrees are also binary trees

For arrays {1,2,3,4,5} the data structure becomes a linked list

Features:

  • There are two children below the parent node.
  • The right node has more data than the left node.

Red and black tree

A red-black tree is a specific type of binary tree that is used in computer science to organize blocks of data such as numbers. If a binary search tree is a red-black tree, any of its children must be red-black.

Red black tree is a variant of balanced binary search tree. The height difference between left and right subtrees may be greater than 1, so red black tree is not a balanced binary tree (AVL) in the strict sense, but the cost of balancing it is lower, and its average statistical performance is better than AVL.

Since every red-black tree is a binary sorting tree, the search algorithm applied to ordinary binary sorting tree can be used to search the red-black tree, and color information is not needed in the search process.

Red-black tree data structure is shown below:

Features:

  • A red-black tree is a binary lookup tree where each node has a color attribute, either red or black.

  • The nodes are red or black.

  • The root is black.

  • All the leaves are black. (Leaves are NIL nodes)

  • Every red node has two children that are black. (No two consecutive red nodes on all paths from each leaf to the root)

  • All paths from any node to each leaf contain the same number of black nodes.

  • These constraints enforce the key property of red-black trees: the longest possible path from root to leaf is no more than twice as long as the shortest possible path. The result is that the tree is roughly balanced. Because worst-case times for operations such as inserting, deleting, and finding a value are required to be proportional to the height of the tree, this theoretical upper limit on height allows red-black trees to be efficient in worst-case scenarios, unlike ordinary binary search trees.

  • It’s property 4 that ensures that there can’t be two consecutive red nodes on the path. The shortest possible paths are all black nodes, and the longest possible paths have alternating red and black nodes. Since by property 5 all the longest paths have the same number of black nodes, this means that no path can be more than twice as long as any other path.

  • Because a red-black tree is a specialized binary lookup tree, the read-only operation on a red-black tree is the same as on a normal binary lookup tree.

B-Tree

  • The leaf nodes have the same depth and the pointer to the leaf node is null
  • All elements are not duplicated
  • The data index in the node is arranged incrementally from left to right

B+Tree

  • Non-leaf nodes do not store data, but only indexes (redundant), which can hold more indexes
  • The leaf node contains all the index fields
  • Leaf nodes are linked by Pointers to improve the performance of range access (it can improve the efficiency of range search)

Key words: in-node order, leaf node pointer link, non-leaf node storage index (redundant)

Mysql > select * from ‘mysql’;

mysql> show global status like 'Innodb_page_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Innodb_page_size | 16384 |
+------------------+-------+
Copy the code

Why 16KB?

Hash

  • A hash is performed on the key of the index to determine the location of the data store
  • Many times a hash index is more efficient than a B+ tree index
  • The value can only be “=”, and “in” does not support range query
  • A hash conflict exists

The index

InnoDB Index implementation (aggregation)

  • Table data file is an index structure file organized according to B+Tree
  • Clustered index – leaf nodes contain complete data records
  • Why InnoDb tables must have primary keys and use integral incremented primary keys is recommended?
    • If no index is set, MySQL will select a unique column as the primary key index. Create a hidden column like rowid.
    • The table data files are maintained according to the B+Tree data structure, and the data in the leaf node is maintained for the row. So you have to have a primary key.
    • Integer type is more convenient for B+Tree sorting. If the integer type is added, the storage of data structure is faster, and the sequential storage does not need to carry out a large number of Tree balancing operations.
  • Why do non-primary key index leaf nodes store primary keys?
    • Consistency, allowing primary key indexes to succeed before updating non-primary key index relationships
    • Save storage space.
  • Primary key index schematic diagram:

  • Schematic diagram of non-primary key indexes

If name = Alice is used to query:

  1. Search through the non-primary key index and get the information after the query (Alice, 18). In fact, this is also a non-clustered index
  2. Then do the back table query, again through the primary key to do the back table query.

Two data files:

. FRM stores table structure information

Ibd stores indexes and data

MyISAM index file (non-clustered)

  • Index files and data files are separated (non-clustered)

Three data files:

. FRM data structure file

Myd files are mainly used to store data

Myi files mainly store index information

Clustered indexes and non-clustered indexes

Features:

Clustered/unclustered is basically whether the index file and the data file are together.

In terms of query efficiency, clustered indexes can be faster than cross-file queries.

Union/composite index

Multiple fields are organized into a common index

  • Why is the left-most prefix principle used this way?
    1. Index data is sorted and cannot be used if fields are skipped.
    2. Example:

Where name =’ Jeff’ and age = 22 where age = 30 and postatin=’manager’ where postation =’ dev’ where postation =’ dev

The resources

  • Baidu encyclopedia