Moment For Technology

Deep understanding of the InnoDB (3) - storage structure of the index booklet | free education

Posted on June 24, 2022, 2:29 a.m. by Charlie Powell
Category: The back-end Tag: mysql

1. Various storage structures of indexes and their advantages and disadvantages

1.1 binary tree

  1. Advantages:

Binary tree is a kind of structure is more efficiently than the order for the structure of the target element, it can start from the first parent node with the target element value comparison, if return to the current node are equal, if the target element value is less than the current node, then move to the left child node, greater than is moving to the right child nodes, Repeat the operation and finally move to the target element node.

  1. Disadvantages:

In most cases, we design the index to provide an increment integer field in the table as the index column. In this case, using the binary tree structure will result in our index always being added to the right, just as it would be if we did not add the index

1.2 the red-black tree

  1. Advantages:

Red-black tree is also called balanced binary tree, which not only inherits the advantages of binary tree, but also solves the problem of auto-increment plastic index encountered by the above binary tree. As can be seen from the dynamic graph below, red-black tree will adjust the structure in left-rotation and right-rotation, and always ensure the rule of the number of left child nodes the number of parent nodes the number of right child nodes.

  1. Disadvantages:

When there's a lot of data, there's a lot of depth. As you can see from the diagram, each parent node can only have two children. If we have a lot of data, then the tree depth will still be very large, maybe more than ten or twenty levels, which is bad for our disk addressing and still takes a lot of time to find.

1.3 the Hash

  1. Advantages:

The main Hash algorithms are MD5, SHA256, etc., and then use the Hash result as a pointer to a file pointer that can get data from the index file, and then get data from the data file. According to such a design, When we search for the record where Col2 = 22, we only need to hash 22 to get the file pointer of the row of data corresponding to the index, so as to locate the target record in the MySQL data file. The query efficiency is very high.

  1. Disadvantages:

Select count(id) from sus_user where id 10; Therefore, the Hash index structure can only be used in scenarios where the field name = the target value. Not suitable for fuzzy query (like) scenarios.

1.4 B - Tree

Since red black trees have disadvantages, we can conceive a new storage structure based on red black trees. The solution is also very simple, since the depth of the tree is too long, we only need to increase the number of data that each tree node can store appropriately, but the number of data must also be set a reasonable threshold, otherwise too many data of a node will produce redundant consumption.

1.4.1 Some features of B-tree

  • Degree - Number of data stores of a node. If the number of data in each tree node is greater than 15/16*Degree (not verified), the tree will be automatically split and the structure will be adjusted
  • The leaves have the same depth, and the left subtree has the same depth as the right subtree
  • The pointer to the leaf node is null
  • The data keys in the node are arranged in ascending order from left to right
  1. Advantages:

The structure of BTree can make up for the shortcomings of red-black trees and solve the problem that the depth of the whole tree is too long when the amount of data is too large. The same amount of data requires fewer layers, the same depth of the tree can store more data, naturally more efficient lookup.

  1. Disadvantages:

From the above, querying a single piece of data is very fast. However, if the scope is searched, the BTree structure must be queried from the root node every time, which reduces the efficiency. Therefore, another variant of BTree B+Tree (B+Tree) is used in the actual application.

2. B+Tree -- InnoDB index scheme

2.1 What is the optimization of B+Tree compared with BTree?

B+Tree storage structure, only leaf nodes store data. The new B+ tree structure does not store record data in all nodes, but only stores index information in the lowest leaf node, and all non-leaf nodes in the upper layer only store index information. Such a structure can store more index values in a single node, increase the value of Degree, and improve the probability of hitting target records.

This structure stores some of the redundant data in the upper non-leaf nodes, but this disadvantage is tolerated because the redundant data is indexed and does not impose a large burden on memory. This structure stores some of the redundant data in the upper non-leaf nodes, but this disadvantage is tolerated because the redundant data is indexed and does not impose a large burden on memory.

Indexes in InnoDB narrow the search area layer by layer by the next page number that the table of contents points to for efficient queries. Compare the contents of the page with the target value through the dichotomy method, find out the page number of the next layer, and search again under the page until the leaf node is reached

2.2 Storage Structure of indexes

In the index entry to actually grow our user record, just two of the directory entry column is the primary key and page number, so the InnoDB reuse before storage to store user record data page directory entry, to make a distinction between user record, we call these used to represent the directory entry record entry record.

The structure of the Page is the same (the seven parts we described earlier), and the Page Directory is generated by the primary key value, so you can use dichotomy to speed up the lookup by primary key value.

Both the data pages that hold user records and the data pages that hold directory entry records are stored in the B+ tree data structure, so we also call these data pages nodes. As can be seen from the figure, our actual user records are stored in the lowest nodes of the B+ tree, which are also called leaf nodes or leaf nodes. The other nodes used to store directory items are called non-leaf nodes or internal nodes, among which the node at the top of the B+ tree is also called the root node.

2.3 Cluster index

The B+ tree is itself a directory, or an index. It has two characteristics:

Sorting records and pages using the size of the record primary key has three implications:

  1. The records in the page are arranged in a one-way linked list in order of the size of the primary key.

  2. Each page storing user records is also arranged in a bidirectional linked list according to the primary key size of user records in the page.

  3. The pages that store directory entry records are divided into different levels. The pages in the same level are also arranged in a bidirectional linked list according to the primary key size of directory entry records in the page.

The leaves of the B+ tree store the complete user record.

A complete user record is one in which the values of all columns (including hidden columns) are stored.

Advantages and disadvantages of clustered indexes

  • You can keep related data together.
  • Faster data access.
  • Queries that use an overridden index scan can directly use the primary key value in the page node.
  • If tables are designed and queried to take advantage of these features, performance will be greatly improved. Of course, clustered indexes also have their drawbacks:
  • Clustered indexes maximize performance for I/ O-intensive applications, but they have no advantage if all data is stored in memory.
  • The insertion speed depends heavily on the insertion sequence. This is why InnoDB typically sets an incremented INT column as the primary key.
  • Updating the clustered index is expensive because InnoDB is forced to move every updated row to a new location.
  • If new data is inserted out of order, "page splitting" may result.
  • Secondary indexes may be larger than expected. This is because the page children of the secondary index contain primary key columns that reference rows.
  • Secondary index access may require table-back queries.

2.4 Secondary Index

Clustered indexes are sized by primary key, while secondary indexes are sized by user-indexed fields.

And the leaf node of the B+ tree does not store the complete user record, but only the values of the C2 + primary key columns. Instead of a primary key + page number in the directory entry record, there is a C2 column + page number + primary key.

And if you want to query the information of the non-index field, you need to take the primary key ID back to the cluster index

2.5 Joint Index

On the basis of the secondary index, multiple fields are compared. The leftmost index is sorted first. If the same value exists, the next leftmost index field is compared, and so on.

3. InnoDB B+ tree index notes

B+ trees are formed like this

  • Whenever a B+ tree index is created for a table (clustered indexes are not created artificially, they exist by default), a root node page is created for the index. When there is no data in the initial table, there is no user record or directory entry record in the root node corresponding to each B+ tree index.

  • When user records are subsequently inserted into the table, the user records are first stored in the root node.

  • If records are inserted when the free space in the root node runs out, all records in the root node are copied to a newly allocated page, such as page A, and the new page is split to produce another new page, such as page B. The newly inserted record is then assigned to page A or page B based on the size of the key (that is, the primary key in the cluster index, and the corresponding index column in the secondary index), and the root node is upgraded to the page where the directory entry record is stored.

This process requires special attention: the root node of a B+ tree index does not move from the date of birth. So whenever we create an index for a table, the page number of the root node will be recorded somewhere, and then whenever InnoDB storage engine needs to use the index, the page number of the root node will be retrieved from that fixed place to access the index.

About (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.