1. The simplest index

If the query id is 4, only full table scan can be performed without index.

Now you need to design an index for the primary key, which is essentially the primary key directory.

A primary key directory is a directory that combines the page number of a data page with the smallest primary key value in the data page to form an index.

With the primary key directory, and then through the primary key to query data is not convenient. If you want to query the data with ID =3, you will first compare it with the minimum primary key in the primary key directory, that is, with the minimum primary key of each data page. It is found that id=3 is greater than the minimum primary key value of data page 2 1, and less than the maximum primary key value of data page 8 4.

Note The data whose ID is 3 must be in data page 2.

This is the simplest and most basic index concept.

2. Physical storage structure of index page based on B+ tree

Now that we know the index of the primary key directory implementation, there is a problem if the data in a single table is too large, reaching millions or tens of millions or even hundreds of millions. How can we store a large number of data pages and minimum primary key values in the primary key directory?

At this time, is to take the index data stored in the data page to solve the way.

That is, the actual data in the table is stored in the data page, and the index data is also stored in the page.

The index page stores the minimum primary key and data page number.

When there are more and more data pages, there will be more and more index pages, and it becomes very inefficient to query the corresponding data pages through index pages.

At this point, an additional level is added to the index page. At the higher level, the page number of each index page and the minimum primary key value in the page are saved.

If you want to find the data with id=46, first go to index page 35 and then use dichotomy to quickly locate the data in index page 20. Then from the index page 20 through the binary search location, it will find that the data id=46 is in the data page 8, through the binary search of the directory page, query the data ID =46.

If the top-level index page contains too many lower-level index pages, it needs to be layered again

The index page structure is a tree structure. The index page of the non-leaf node stores the page number and minimum primary key value of the index page, while the index page of the leaf node stores the data page and minimum primary key value.

3. Cluster index

The physical storage structure of the index page shows how data is queried through primary key indexes.

In other words, the index page of the corresponding leaf node is found through the binary search of the primary key, and then the corresponding data page is found through the binary search of the corresponding slot through the page directory, and then the corresponding data row is traversed through the data in the slot.

In other words, the index page of a leaf node is the data page itself.

This is also the clustered index of a B+ tree consisting of index pages + data pages.

So, in The InnoDB engine, when you add or delete data, you put data pages directly into the cluster index, the data is in the cluster index, and the cluster index contains the data.

When a page split occurs, data rows are adjusted internally to ensure that the primary key values in the data page are in order and that all primary key values in the next data page are greater than all primary key values in the previous data page.

The upper index data structure is maintained while the page is split. Maintain your index entries in the upper index page. If there are more and more data pages, the corresponding index page can not fit, at this time will appear a new index page and then make an upper index page.

4. Secondary indexes

Suppose you want to index other fields, such as name, age, and so on, all of which are similar.

Now to build an index for the name field, so the data is then insert, on the one hand, the complete data is inserted into the leaf nodes of the cluster index data page, on the other hand, for the name field to regenerate a B + tree, the B + tree leaf nodes and data page, but this data page only for primary key field, and the name field

This is a B+ tree independent of the cluster index, which is an index B+ tree built based on the name field. As for the sorting rule of the data page of the leaf node, it is sorted according to the size of the name value as before. Meanwhile, the value of the name field in the next data page is greater than the value of the name field in the previous data page.

In this B+ tree, the non-leaf nodes store the index page number and the minimum name field value; The leaf node stores the primary key field and the name field value

This is based on the name field value to search data, the search process is consistent, from the root node layer down, through binary search quickly locate the data page.

SQL > select * from user where name = ‘zangsan’; SQL > select * from user where name = ‘zangsan’; SQL > select * from user where name = ‘zangsan’;

Then we need to do the table back. It also needs to locate the complete data row corresponding to the primary key by starting from the root node and finding the data page of the leaf node in the cluster index. Only then can all fields required by SELECT * be returned.

Therefore, such indexes based on ordinary fields are called secondary indexes.