This article hopes that the principles of database indexing will be helpful.

The first part discusses the mathematical basis of MySQL database index in terms of data structure and algorithm theory.

The second part combines the index construction in MySQL database InnoDB data storage engine, and realizes the discussion of integrated index, non-aggregate index, overwrite index and other topics.

Data structure and algorithm theory

Innodb storage engine uses B+ trees to index data structures. Here are some data structures. Explain step by step why you should use B+ trees.

1.1 B+ Tree index

The structure of a B+ tree index is similar to that of a binary tree. Key values quickly find data. However, the B of the B+ species is not binary, it represents equilibrium. Note: only index rows. The database retrieves data from storage by loading pages into storage, and finally retrieves the data.

Introduce binary search. Records in sorted (increasing or decreasing) order to find by jump during a search. For example, 10 quantities of 5, 10, 19, 21, 31, 37, 42, 48, 50, 52 are shown.

You can find 48 in three searches. Each search takes eight times. For the above 10 numbers, the average number of sequential searches is 5.5 times, binary search is 2.9 times, and the worst case is sequential searches 10 times and binary searches 4 times. The two-point search saves slots of InnoDB page Directory in master key order, and the page Directory is retrieved in two parts for each specific record query.

1.2 Binary search tree

The number represents the key value of each node. Look in the trees. The key of the left subtree is always less than the key of the heel, and the key of the right subtree is always greater than the key of the heel. The middle order traversal yields the key values: 2, 3, 5, 6, 7, 8.

The average lookup tree is 2.3 times, but the lookup tree can be constructed arbitrarily. Same order of inquiry as this. Therefore, the idea of achieving binary tree equilibrium, AVL tree, is cited.

1.3 define

It conforms to the definition of a search tree, and then must satisfy that the height difference between the left and right subtrees of any node is at most 1.

Binary tree balancing is very fast, but in order to maintain binary tree balance, it usually requires more than one left and right turn to insert or update the balance of the tree.

1.4 B+ Tree features

All records are stored in leaf nodes in sequence, and each leaf node (page as unit) is kept in logical succession, which is a bidirectional circular linked list.

B+ tree insertion must ensure that the records in the inserted leaf node are also sorted, so the following three conditions must be considered during insertion.

One feature in the database is B. Therefore, in a database, the height of the B+ tree is usually 2 or 3 levels, that is, the record of a particular key row is being sought. A maximum of two to three I/O operations can be performed. A common disk can perform at least 100 I/O operations per second.

Index summary and uncompiled index

The difference between a collection index and a non-collection index is whether a page node holds a record of the entire row.

2.1 Clustered Indexes

InnoDB storage engine tables are index organized tables where table data is stored in primary key order. The set index is a B+ tree made from the primary key of each table, and the whole row record data of the table is stored in the leaf node, so the leaf nodes aggregated by the index also become data pages. This feature is used to collect indexes, and the data in the indexed table is also identified as part of the index. Meanwhile, the data structure of B+ tree is the same. Each data page is linked by a two-way link link.

The actual data is only arranged by a B+ tree. Therefore, there is only one linked index per table. In many cases, the query optimizer prefers a centralized index because it can find the data directly at the leaf node of the index. In addition, because the logical order of the data is defined, object-wide queries can be quickly accessed. The query optimizer can quickly find a range of data that needs to be scanned. Note that the records for each page are also kept in a two-way chain.

2.2 Non-clustered indexes

Also called secondary index. Not all data is in the data row. Page nodes each page-level index contains bookmarks in addition to keywords. The InnoDB memory engine tells us where the row data corresponding to the index is. Since the InnoDB storage engine table is an index organization table, the secondary index bookmark of the InnoDB storage engine is the collection index key for that row’s data. A graph is a relationship between an index and a secondary index.

When the secondary index is used to retrieve data, the InnoDB storage engine creates a circular secondary index, using the leaf level pointer to get the key of the arrow key index, and using the primary key index to find the full row record. For example, to look up data in a secondary index tree of three heights, you must find the primary key of the secondary index. That’s three times. If the index tree height is 3, the combined index is retrieved three times. To search a page with a full row of data, six logical Io must visit the final data page.