Databases – Indexes are not a panacea

An index is a structure that sorts the values of one or more columns of a database table. Using an index allows quick access to specific information in a database table. If you want to look up a particular employee by his or her last name, an index helps you get information faster than if you searched all the rows in a table. But the index is not a panacea, sometimes found in our SQL index does not take effect, we in-depth understanding of the index principle, and misunderstanding,


How does InnoDB store data?

MySQL abstracts data storage and query operations into storage engines. Different storage engines store and read data in different ways. MySQL supports a variety of storage engines, and storage engines can be set up at table granularity. The one we use most often is InnoDB because it supports things

Although the data is kept on disk, its processing takes place in memory. To reduce the number of random reads from disk, InnoDB stores data in page granularity rather than row granularity. That is, data is divided into several pages and stored on disk in page size. InnoDB’s page size is typically 16kb. Each page in a page directory, convenient according to the primary key query records.

Data page structure:

The page table of contents divides records into different groups by slot, and each group has several records. As shown in the figure, the number of the front small square in the record represents the number of records currently grouped, and the smallest and largest slots point to two special pseudo records. With slots, when we search for records in a page by primary key, we can use dichotomy to search quickly without having to start with the smallest record and traverse the entire list of records in a page.

Example: Search for records where primary key (PK) = 15
  • The middle position of the slot is (0+6)/2=3, and the record pointing to it is 12 < 15, so we need to continue searching after slot #3.
  • Then use binary search to find the middle bit of slot #3 and slot #6 (3+6)/2=4.5, round 4, the record corresponding to slot #4 is 16 > 15, so the record must be in slot #4;
  • Search down 3 times from record 12 pointing to slot #3 to locate record 15.

Clustered index and non-clustered index

In InnoDB, the table data file itself is an index structure organized according to B+Tree. Clustered index is a B+Tree constructed according to the primary key of each table. Meanwhile, the leaf node stores the row record data of the whole table, and the leaf node of clustered index is also called data page. This feature determines that the data in the index organization table is also part of the index;

General building table will use a do clustering index on the primary key, if not MySQL will be created by default, but the primary key if you change the price is higher, so the build table to consider on the ID can’t update it frequently.

In our daily work, the index added by ourselves according to the actual situation is the secondary index. The secondary index is a secondary index that needs to find the primary key index. Now find the primary key index and then find the data through the primary key index.

The characteristics of B+ trees include: the lowest node, called leaf node, is used to store data; : Other upper-layer nodes are called non-leaf nodes and are only used to store directory entries as indexes. : Non-leaf nodes are divided into different layers, and the search volume of each layer is reduced through layers; : All nodes are sorted according to the index key size, forming a bidirectional linked list to speed up range search.

  • Therefore, InnoDB uses B+ trees, which can both store the actual data and speed up the data search, called clustering cables

Lead. If the ellipsis in the box below the leaf node is considered actual data, it is the intention of the cluster index. Because data is physically stored in only one copy, there can be only one cluster index containing actual data.

  • InnoDB automatically uses primary keys (single or multiple fields that uniquely define a record) as index keys for clustered indexes

(If there is no primary key, select the first unique column that does not contain a NULL value). The numbers in the box above represent the values of the called keys, which are generally primary keys for clustered indexes.

In order to realize the fast search of non-primary key field, secondary index, also called non-clustered index, secondary index is introduced. Secondary indexes are also data structures that utilize B + numbers

Instead of actual data, the primary key is stored in the leaf node of the secondary index. After obtaining the primary key value, data rows are obtained from the clustered index. This process is called back to the table.

What does that mean? When you execute a SQL statement, you need to fetch data from two B + indexesCopy the code

SQL > alter TABLE TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL alter table TBL

  	SELECT * FROM tbl WHERE a=1
Copy the code

This will not result in a back table because all the data will be found in the index tree of A

  SELECT * FROM tbl WHERE b=1
Copy the code

Select * from a where (a,b); select * from B where (b, C); select * from A where (B, C); select * from B where (B, C); Two index trees are searched, and this is called back to the table. Index overwriting is the ability to find all the data you need from an index without having to go to another data structure. You don’t have to go back to the table.


Consider the cost of creating additional secondary indexes


The cost of creating a secondary index is mainly reflected in three aspects: maintenance cost, space cost and table return cost.

  • Maintenance cost: If N secondary indexes are created, N B+ trees need to be created. When adding data, not only the cluster indexes but also the N secondary indexes need to be modified.
  • Space cost: Although the secondary index does not hold the original data, it does hold the index column data, so it takes up more space
  • Table back code: secondary index does not save the original data, through the index to find the primary key need to query the cluster index, in order to get the data we want

Not all queries against indexed columns can use indexes

1. Indexes can only match column prefixes

For example, in the following LIKE statement, the index cannot be searched for the user whose name suffix is name123, and the type=ALL execution plan represents a full table scan:

EXPLAIN SELECT * FROM person WHERE NAME LIKE '%name123' LIMIT 100
Copy the code

I’m going to put the percent sign behind it and I’m going to do prefix matching, type=range means index scan, key=name_score and I’m going to see the actual index

EXPLAIN SELECT * FROM person WHERE NAME LIKE 'name123%' LIMIT 100
Copy the code

2. The condition involves that a function operation cannot move an index.

For example, search criteria using the LENGTH function, must not go to the index

EXPLAIN SELECT * FROM person WHERE LENGTH(NAME)=7
Copy the code

3. The federated index matches only the left column

A joint index is built for name and score, but the index cannot be searched only by score column

EXPLAIN SELECT * FROM person WHERE SCORE>45678
Copy the code

Personal Blog:blog.yanxiaolong.cn/