This is the 11th day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021

What is the index

An index is a data structure designed to improve the efficiency of data query. It is similar to a book table of contents.

In MySQL 8.0, InnoDB storage engine supports B+ tree index, full-text index and R tree index, among which B+ tree index is the most widely used.

B + tree index

Each index in InnoDB corresponds to a B+ tree. B+ tree index features: disk-based balanced tree, the tree is very short, usually three to four layers, so the access efficiency is very high, from tens of millions or hundreds of millions of data query a data, only 3 or 4 I/O.

Suppose we have the following table where ID is the primary key and k has an index:

Primary and non-primary key indexes are shown as follows:

Where R stands for a whole row of values.

The difference between a primary key index and a non-primary key index is:

  • The leaf node of the primary key index holds the entire row of data;
  • Leaf nodes that are not primary key indexes hold primary key values.
  • Non-primary key indexes are also called secondary indexes, and primary key indexes are also called clustered indexes.

1. If the query statement is

Select * from table where ID = 100 select * from table where ID = 100

2. If the query statement is

Select * from table where k =1; select * from table where ID=100; select * from table where k =1;

MySQL > manage B+ tree index

  1. The EXPLAIN command is used to see if the index is used.

  2. Query mysql. innodb_index_STATS to see the general condition of each index.

field paraphrase
database_name The database name
table_name The name of the table
index_name Index name
last_update Time when statistics were last updated
stat_name Name of statistics
stat_value The value of the statistics
sample_size The sampling size
stat_description Type specification
  1. Lookup tablesys.schema_unused_indexesSee which indexes are unused and can be discarded.
  • MySQL5.7 and above in SYS mode
  • Schema_redundant_indexes and schemA_UNusED_INDEXES are two views

MySQL stores data and index object analysis

Index organization table

The storage of data is divided into heap table and index organized table. At present, most databases support the storage of index organized table.

  1. Heap table

As shown in the figure above, the data in the heap table and the index are stored separately. The index is ordered and the data is unordered. The leaf node of the index stores the address of the data in the heap table. When data in the heap table changes, its location also changes, causing the addresses in the index to be updated, which affects performance.

  1. Index organization table

Data is sorted by primary key in an index, also known as a clustered index. In an index organization table, data is an index and an index is data. This is how the InnoDB storage engine organizes data.

Secondary indexes

In addition to the primary key index, other indexes are called secondary index, or non-clustered index, is also a B+ tree index, which is different from the primary key index is stored in the leaf node index key value, primary key value.

When data is queried by using a secondary index, the primary key value is first found by the secondary index, and then the data is queried by the primary key index. This secondary index is queried again by the primary key index.

Compared to a heap table, a secondary index such as an index organization table has significant performance advantages because other indexes do not need to be maintained when data changes, unless the primary key of the record is changed.

Cover index

As mentioned above, the leaf node of the secondary index stores the index key, primary key,

For example, we have the following table:

create table user (
id int primary key,
name varchar(20),
sex varchar(5),
index(name)
)engine=innodb;
Copy the code
  1. Indexes cover
select id,name from user where name='ls';
Copy the code

The index leaf node stores the primary key ID. The id and name can be obtained through the index tree of name without returning to the table, which is in line with index coverage and high efficiency.

  1. Back to the table
select id,name,sex from user where name='ls';
Copy the code

The name index can be matched. The index leaf node stores the primary key ID, but the sex field can only be obtained by querying back to the table, which does not meet the index coverage. Therefore, the efficiency of obtaining the sex field through the ID value scanning code aggregation index will be reduced.

The index tuning

A functional index

Starting with MySQL 5.7, MySQL supports the creation of functional indexes (where the index key is a functional expression). Functional indexes serve two purposes:

  1. Optimize business SQL performance:

Select * from register_date where DATE_FORMAT(register_date,’%Y-%m’) = ‘2021-10′; select * from register_date where DATE_FORMAT(register_date,’%Y-%m’) = ‘2021-10’;

The answer is no, the index only sorts register_date data, not DATE_FORMAT(register_date), so it cannot be used.

We can solve this problem by using functional indexes, creating an index of DATE_FORMAT(register_date).

ALTER TABLE Testtable
ADD INDEX 
idx_func_register_date((DATE_FORMAT(register_date,'%Y-%m')));
Copy the code
  1. With the Generated Column.

For example, the following table:

CREATE TABLE User (
    userId BIGINT,
    userInfo JSON,
    mobile VARCHAR(255) AS (userInfo->>"$.mobile"),
    PRIMARY KEY(userId),
    UNIQUE KEY idx_mobile(mobile)
);
Copy the code

The mobile column is a virtual column calculated by the following function expression. The column itself does not occupy any storage space, while the index IDx_mobile is essentially a functional index. The advantage of this is that you can use the virtual column directly when writing SQL instead of writing lengthy functions:

-- No virtual columns
SELECT  *  FROM User
WHERE userInfo->>"$.mobile" = '15088888888'

-- Use virtual columns
SELECT  *  FROM User 
WHERE mobile = '15088888888'
Copy the code

Left-most prefix rule

B+ tree is an index structure that uses the left-most prefix of the index to locate records.

For example, we have fields A and B, both of which are high-frequency fields. In order to reduce return tables, we can establish joint indexes (A, B), and there is no need to establish indexes on A separately.

If a is larger than b, create (a,b), (b), (b,a), (a), (a), (a), (b).

The choice between normal and unique indexes

Conclusion: In cases where business code is guaranteed not to write duplicate data, it is recommended to choose normal indexes.

The query:

  • Normal index: After finding the first record that meets the condition, it also needs to find the next record until it encounters the first record that does not meet the condition.

  • Unique index, because indexes define uniqueness, the search will stop after the first record that meets the condition is found.

The difference above is that the performance gap is minimal. Because for data reading will need to read more than one data read from the disk, the Innodb data is carried out in accordance with the page as the unit to read and write, the default size for each page 16 KB, so for the average index, just do it once more “to find and determine the next record” operation, only need a pointer search and a calculation, Operating costs are negligible for today’s cpus.

Update:

  • With normal indexes, updates are recorded in the Change buffer, and statement execution ends.

  • Unique index, need to read the data page into memory, determine that there is no conflict, insert this value, the end of the statement.

Change Buffer cannot be used for unique index updates. Normal indexes can be used

What is a change buffer?

  1. Innodb records the update operation in the Change Buffer if the data page is not in memory. Innodb saves the process of reading the data page from the disk. The next query will read the data page into memory. Merge the change Buffer record to return the data, while simultaneously merging the operations in the change Buffer to the original data page.
  2. The change buffer is copied in memory and written to disk, so it can persist data.

For unique indexes, the data pages need to be read into memory to determine whether the uniqueness constraint is violated. Since all the data pages have been read into memory, there is no need to change buffer. Normal indexes, on the other hand, record updates in the Change buffer. Due to the high cost of disk IO, using Change Buffer is more performance-friendly.

Composite index

A Compound Index is a B+ tree Index composed of multiple columns.

  1. Such as:

Select * from (a, b); select * from (a, b);

WHERE a = ?
WHERE a = ? AND b = ?
WHERE b = ? AND a = ?
WHERE a = ? ORDER BY b DESC
Copy the code

Index (a, b) sort does not yield (b, a) sort so the following SQL cannot be optimized:

WHERE b = ?
WHERE b =ORDER BY a DESC
Copy the code
  1. useComposite indexforIndexes cover

If the queried field is in the leaf node of the secondary index, the query result is directly returned without returning to the table. This optimization technique by combining indexes to avoid returning to the table is also known as Index Covering.

Using the feature of composite index containing multiple columns, index coverage technology can be implemented to improve SQL query performance.