Daily sentence

Don’t stop, because others will be more than you; Don’t look back, lest you fall. — Al Rehani

Introduction Outlines

Index is like a book, when the user wants to retrieve the data information they want in the massive data, for example: when the user goes to look up some rare words, the best way is through the dictionary index catalog, to filter the scope and finally locate the page number of the data we want. This can greatly improve the speed of our query retrieval, so the use of indexes can greatly improve the performance and speed of database retrieval data.

The text introduce

If the above introduction overview, said or too abstract and not enough in-depth, so let us step into the field of MySQL Innodb storage engine! MySiam storage engine for SQL Server, Oracle, DB2, or MySQL is not necessarily valid, please do not confuse.

Introduction to indexes

As we all know, the indexes of MySQL database are divided into clustered indexes and non-clustered indexes in physical structure. MySQL > create index ();

create [unique|fulltext|spatial] index index_name
[using index_type]
on table_name(colum [asc/desc]);
Copy the code

Create indexes

Clustered index

In Innodb storage engine, MySQL data is in primary key order, so a clustered index is a B+ tree based on the primary key of each table, so each table can only have one clustered index. In addition, the clustered index stores row data information, that is, the clustered index itself is a data, each leaf node of the B+ tree is a row of data information, in addition, the data is sorted according to the order of the clustered index.

In general, clustered indexes default to primary key indexes,

Nonclustered index

  • Compound index: the function of creating a composite index for multiple fields.
  • Prefix index: The leftmost matching rule establishes the index by the leftmost element.
  • Unique index: Each index element is a unique index.

These types of indexes, generally called secondary indexes (secondary indexes), are called non-clustered indexes, and their underlying data structure is B+ tree.

What if my table does not have a primary key?

There are hidden fields for each row, such as ROW_ID and DELETE_SIT, which are used as the clustered index, but not immediately, and there are judgments and other fields for control.

  1. If there is no primary key, a unique index and non-empty field is used as the primary key, that is, the clustered index.
  2. If none of the above fields and indexes are available, the hidden field described above :ROW_ID is used as the clustered index;

Note: For the primary key and uuid as the primary key of the difference, due to the primary key is using a clustered index, because the clustered index is orderly, if primary key belongs to the increase of the ID, the location of storage must be adjacent disk location, the way to write performance is very good, but if is uuid, if often insert, There would be frequent moving addressing to different disks fast, so write performance would be poor! .

The index principle

First comes a table of primary keys, as shown below, where PID is the primary key

pid name birthday
5 zhangsan 2020-12-12
8 list 2021-12-12
11 wangwu 2016-12-12
13 zhaoliu 2016-12-12

The upper part is the primary key index B+ tree, the lower part is the real data on disk, of course execute the following statement:

select * from table where pid ="Eleven";Copy the code

Then the execution process:

As shown in the figure above: Starting from the root node, three tree operations are needed to locate the corresponding actual data information. But if you don’t use an index, you scan the disk line by line until you find the location of the data.

So you can see that if it’s very fast, but if there’s operational data like adding data, modifying data, deleting data. This situation requires the maintenance of related index data objects, so we introduced non-clustered indexes.

create index table_index on table(name);
Copy the code

The structure diagram is as follows:

  • We can see that there are two indexes in the B+ tree. The name of the table is used to create a new non-clustered index in the B+ tree. Therefore, if we add an index, the size of the table will increase and the disk storage space will be occupied.

  • Note, however, that non-clustered index leaf nodes do not store complete rows of data, but only the values of the clustered index (primary key -> values belonging to the clustered index) + the data values of the index field.

select * from table where name = "list1";
Copy the code

As can be seen from the above, the search starts from the non-clustered index tree, then finds the clustered index, and then finds the whole data row from the clustered index.

When not to aggregate the index tree to query?

If we use the following SQL statement, compared to my SQL statement above, it belongs to the return name field.

select name from table where name = "list1";
Copy the code

If, as shown in the red line above, the desired data is found on a non-clustered index tree, if it is overridden, the index tree will not be clustered to query all other data.

So hi to select col > select * is much faster in the case of clustered indexes. This is also because overwriting indexes improves performance.

create index table_index on table(birthday);
Copy the code

A new index tree is created after executing the above SQL.

The search method is to search both non-clustered indexes based on conditions, and then go to the clustered index to query all rows.

Note: and need to consider is not arbitrarily add index oh, because each add a need to build an index tree, too many index trees, and then the maintenance of data will be more complex, will lead to the performance of operation data greatly reduced.