Indexes are critical for good database performance. Whenever database performance tuning is mentioned, the first thing that comes to mind is “index” to see if an index is added to the table. Especially when the amount of data in a table is increasing, the impact of indexes on performance is particularly significant. When the data volume is small and the load is low, the impact of no index or improper index on performance may not be significant, but as the data volume increases, performance deteriorates dramatically.

However, indexes are often ignored, sometimes even misunderstood, misused, and often encounter performance problems caused by poor indexes in actual use. This article on the index concept, type, advantages and other aspects of the chat, in-depth understanding of the index of this matter, more help you clearly understand the index, to be able to use it correctly, easy to use it to carry on the optimization of the database.

What is an index

Index is a data structure that helps MySQL obtain data efficiently. It is a data structure that storage engines use to quickly find records.

The easiest example to understand how indexes work in MySQL is to look at the “Indexes” section of a book’s table of contents. If we want to find a chapter in a book, we usually start with the table of contents “index” of the book and immediately find the corresponding page number.

In MySQL, the storage engine uses indexes in a similar way, first finding the corresponding value in the index, and then finding the corresponding data row based on the matching index record.

Query is the most commonly used operation in the database, we all want the speed of query as fast as possible, so the designer of database system will optimize from the perspective of query algorithm. The most basic query algorithm is, of course, sequential lookup, but this algorithm has O(n) complexity and can be very bad at large data volumes. For example, there is the following data in a user table T_user. If you want to query a person aged 89, if you search in order, you have to scan line by line, which shows how inefficient the query is (the larger the number is, the more uneven the data distribution is, the longer the time it takes).

mysql> select * from t_user;
+----+----------+-----+
| id | name     | age |
+----+----------+-----+
|  1 | xcbeyond |  22 |
|  2 | jack     |  34 |
|  3 | tom      |  77 |
|  4 | kitty    |   5 |
|  5 | make     |  91 |
|  6 | Mickey   |  23 |
|  7 | Andy     |  89 |
+----+----------+-----+
7 rows in set
Copy the code

Fortunately, database system designers have realized this earlier, reference the better search algorithms, such as binary search, binary tree search, etc., but after analysis found that each search algorithm can only be applied to a specific data structure, such as binary search request was data query and orderly, and binary tree search can only be applied to the binary search trees. In view of this, in addition to the data, the database system maintains data structures that satisfy specific lookup algorithms, which refer to (point to) the data in a way that enables the implementation of advanced lookup algorithms on these data structures, namely, the indexes in the database.

To get a better understanding of indexes, the following figure shows one of the possible indexes using data from table T_USER.

On the left is the data in the table, with a total of 7 records. In order to speed up the search of age column, a binary search tree as shown on the right is maintained. Each node contains the index key value and a pointer to the corresponding data record, so the use of binary search can quickly find the corresponding data, the time complexity is O(log2 N).

In real databases, however, such binary lookup trees are rarely used (because they are data-demanding), but the principle is similar.

Second, index operation

MySQL > create index (); create index (); delete index (); (It is recommended to keep and collect separately)

1. Create an index

Indexes can be created in the CREATE TABLE statement, or can be added to a TABLE using CREATE INDEX or ALTER TABLE alone.

Grammar:

CREATE [UNIQUE/FULLTEXT] INDEX < INDEX name > ON < table name >

The ALTER TABLE < TABLE name > ADD INDEX | | to chassis PRIMARY KEY | FULLTEXT INDEX < name >, < column >)

When creating an INDEX, you can specify the PRIMARY KEY INDEX, UNIQUE INDEX, FULLTEXT INDEX, or common INDEX.

Such as:

Create table index_test (index_test);

(You can also create an index when creating a table. In this example, create an index separately.)

mysql> create table index_test(id int,ch varchar(32));
Query OK, 0 rows affected
Copy the code

Create index (index_test);

mysql> create index idx on index_test(id);
Query OK, 0 rows affected
Records: 0  Duplicates: 0  Warnings: 0
Copy the code

or

mysql> alter table index_test add index idx(id);
Query OK, 0 rows affected
Records: 0  Duplicates: 0  Warnings: 0
Copy the code

2. Rebuild the index

Rebuild indexes are often used in routine database maintenance operations. After the database has been running for a long time, indexes can become corrupted and need to be rebuilt. Data reconstruction index can improve retrieval efficiency.

Rebuild an index, essentially a repair of a table.

Such as:

mysql> repair table index_test quick;
+-----------------+--------+----------+---------------------------------------------------------+
| Table           | Op     | Msg_type | Msg_text                                                |
+-----------------+--------+----------+---------------------------------------------------------+
| test.index_test | repair | note     | The storage engine for the table doesn't support repair |
+-----------------+--------+----------+---------------------------------------------------------+
1 row in set
Copy the code

3. Query indexes

Sometimes, in order to see if a table there is an index, the index case, you need to pass commands show index from | in table_name to check index.

Grammar:

SHOW the INDEX FROM | IN < table name >

Such as:

mysql> show index from index_test;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+- -----------+---------+---------------+
| Table      | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+- -----------+---------+---------------+
| index_test |          1 | idx      |            1 | id          | A         |           0 | NULL     | NULL   | YES  | BTREE      |         |               |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+- -----------+---------+---------------+
1 row in set
Copy the code

InnoDB’s default index type is B Tree. InnoDB’s default index type is B Tree.

4. Delete indexes

Dropping an INDEX can be done using the DROP INDEX or ALTER TABLE statement.

Grammar:

DROP INDEX < id > ON < id >

ALTER TABLE < TABLE name > DROP INDEX < INDEX name >

Such as:

mysql> drop index idx on index_test;
Query OK, 0 rows affected
Records: 0  Duplicates: 0  Warnings: 0
Copy the code

or

mysql> alter table index_test drop index idx;
Query OK, 0 rows affected
Records: 0  Duplicates: 0  Warnings: 0
Copy the code

Index type

There are many types of indexes that can provide better performance for different scenarios. In MySQL, indexes are implemented at the storage engine layer, so there is no universal index standard: indexes work differently in different storage engines, and not all storage engines support all types of indexes.

From the storage structure:

  • BTree index (B-tree or B+Tree index)
  • The hash index
  • Full-text index

In terms of application level:

  • Normal index: that is, an index contains only a single column. A table can have multiple single-column indexes.
  • Unique index: The value of the indexed column must be unique, but empty values are allowed.
  • Compound index: That is, an index contains multiple columns.

From the storage structure of indexes, we will take a look at the index types supported by MySQL, how they are implemented, and their advantages and disadvantages.

MySQL’s default storage engine is Innodb, which only explicitly supports B-tree indexes. For frequently accessed tables, Innodb will transparently build adaptive hash indexes, that is, build hash indexes based on B-tree indexes, which can significantly improve search efficiency. For clients, Innodb is transparent, uncontrolled, and implicit.

1. The b-tree indexes

A b-tree index uses a B-tree data structure to store data, allowing the system to efficiently locate the disk block where the data resides.

B stands for balance, not binary, because B Tree evolved from the earliest balanced binary trees.

B-tree is a balanced search Tree designed for external storage devices such as disks. So before we talk about B-tree, we need to know about disks.

When the system reads data from disks to memory, the basic unit is disk blocks. The data in the same disk block is read at a time instead of what is needed.

InnoDB storage engine has the concept of pages, the smallest unit of disk management. The default page size in the InnoDB storage engine is 16KB. You can use the innodb_page_size parameter to set the page size to 4K, 8K, or 16K. In MySQL, you can run the following command to check the page size:

mysql> show variables like 'innodb_page_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| innodb_page_size | 16384 |
+------------------+-------+
1 row in set
Copy the code

The storage space of a system disk block is usually not that large, so InnoDB uses several contiguous disk blocks to achieve a page size of 16KB each time it requests disk space. InnoDB reads data from disk to disk on a page basis. If each piece of data on a page helps locate data records, this will reduce disk I/O times and improve query efficiency.

B-tree defines the data record as a binary [key, data] :

  • Key is the primary key of a record, that is, the primary key value in a table. It is used to record unique rows of data. The key value is unique and distinct.

  • Data is the data in a row except for the primary key.

An M-level B-tree has the following characteristics:

  • Each node has a maximum of M children.
  • Every node except the root and leaf nodes has at least oneceil(m/2)A child.
  • If the root node is not a leaf node, there are at least 2 children.
  • All leaf nodes are in the same layer and contain no other keyword information.
  • Each non-terminal node contains n keyword information (p0,p1,... pn,k1,... kn)
  • The number of key words n meets the following requirements:ceil(m/2)-1 <= n <= m-1
  • Ki (I = 1,... n)Is the keyword, and the keyword is sorted in ascending order.
  • PI (I = 1,... n)Is a pointer to a child node.p(i-1)The key words of all nodes in the subtree pointing to are less thankiBut both are greater thank(i-1).

Note:ceil()Is the integral function.

Each node in the B-tree can contain a large number of key values, data, and pointer P. The following figure shows a 3-order B-tree index structure:

Each node occupies one disk block space. On a node, there are two keys in ascending order and three Pointers P to the child node. The Pointers store the address of the disk block where the child node resides. The two key words are divided into three range fields corresponding to the three Pointers P, and point to the data range of the child node. For the root node, the keywords are 17 and 35. The data range of the child node to which the P1 pointer points is less than 17, the data range of the child node to which the P2 pointer points is 17-35, and the data range of the child node to which the P3 pointer points is greater than 35.

Simulate the process of finding a data row with keyword 29:

  1. Locate disk block 1 based on the root node and read it into memory. [Disk I/O operation 1]

  2. Compare keyword 29 in the interval (17,35) to find pointer p2 to disk block 1.

  3. Locate disk block 3 according to p2 pointer and read into memory. [Disk I/O operation 2nd]

  4. Compare keyword 29 in interval (26,30) to find pointer p2 to disk block 3.

  5. Locate disk block 8 according to ‘p2’ pointer, read into memory. [Disk I/O operation 3]

  6. Find keyword 29 in the keyword list in disk block 8.

Analyzing the above procedure, we found that three disk I/O operations and three memory lookups were required. Since the key in memory is an ordered table structure, binary lookup can be used to improve efficiency. Three disk I/O operations affect the efficiency of the entire B-tree search. Compared with AVLTree(highly balanced binary Tree), B-tree reduces the number of nodes, so that each disk I/O data in memory plays a role, thus improving the query efficiency.

2. The B + Tree index

B+Tree is an optimization based on B-tree to make it more suitable for storage index structure, InnoDB storage engine uses B+Tree to achieve its index structure.

From the B-tree structure diagram in the previous section, you can see that each node contains not only the key value but also the data value. The storage space of each page is limited. If the data on each node (that is, a page) is large, the number of keys that can be stored is small. If the data on each page is large, the b-tree depth is large, which increases the disk I/O times and affects the query efficiency. In a B+Tree, all data record nodes are stored on the leaf nodes at the same layer according to the order of key values, instead of the non-leaf nodes only storing key values. This greatly increases the number of key values stored on each node and reduces the height of the B+Tree.

B+Tree is different from B-tree in the following aspects:

  • Non-leaf nodes only store key-value information.

  • There is a chain pointer between all leaf nodes.

  • Data records are stored in leaf nodes.

The b-tree in the previous section is optimized. Since the non-leaf nodes of B+Tree only store key value information, assuming that each disk block can store four key values and pointer information, the structure of the B+Tree is as follows:

There are usually two head Pointers on a B+Tree, one to the root node and the other to the leaf node with the smallest keyword, and there is a chain-ring structure between all the leaf nodes (that is, data nodes). Therefore, there are two kinds of lookup operations on B+Tree: a range and paging lookup on the primary key, and a random lookup starting from the root node.

There are only 22 data records in the above example, so we can not see the advantages of B+Tree. Here is a calculation:

InnoDB storage engine page size is 16KB, the typical table primary key type is INT (4 bytes) or BIGINT (8 bytes), pointer type is also generally 4 or 8 bytes, This means that a page (a node in B+Tree) stores 16KB/(8B+8B)=1K keys (K = 10^3 for convenience). This means that a depth of 3 B+Tree index can maintain 10^3 * 10^3 * 10^3 = 1 billion records.

In practice, each node may not be fully filled, so in the database, the height of B+Tree is generally between 2 and 4 layers. MySQL’s InnoDB storage engine is designed to have the root node resident in memory, meaning that finding a row record for a particular key requires at most one to three disk I/O operations.

Hash index

Hash index, is based on the hash table implementation. For each row of data, the storage engine computes a hash value for all index columns, and the hash value is different for rows with different key values. A hash index stores all hash values in the index, along with a pointer to each data row in the hash table.

In MySQL, only the Memory engine display supports hash indexes. Hash indexes are the default index type of the Memory engine, and the Memory engine also supports B-tree indexes.

If multiple columns have the same hash value, the index stores multiple record Pointers to the same hash value in a linked list.

Continue with the data in the table T_USER and set the hash index for the field name. Assuming that the hash function used for the index is f(), the calculated hash value (all sample data, not real data) is:

f('xcbeyond')=2390

f('jack')=4010

f('tom')=5178

f('kitty')=1067

f('make')=7901

f('Mickey')=3079

f('Andy')=8301
Copy the code

The calculated hash value will point to the data in the corresponding data row, and the pointing relationship is shown as follows:

You can run the following query and query the corresponding data.

mysql> select * from t_user where name = 'xcbeyond';
+----+----------+-----+
| id | name     | age |
+----+----------+-----+
|  1 | xcbeyond |  22 |
+----+----------+-----+
1 row in set
Copy the code

The hash value of XCBeyond is computed and the corresponding data row is found based on that hash value. F (‘xcbeyond’)=2390, so MySQL looks for 2390 in the index and finds the data row that points to row 1, and then compares the value of row 1 to xcbeyond to ensure the accuracy of the data found.

Because the index itself only needs to store the corresponding hash value, the structure of all indexes is very compact, which also makes hash index lookup very fast. However, hash indexes have their limitations: index invalidation.

  • Hash index data is not stored in order of index value, so it cannot be used for sorting.
  • Hash indexes do not support partial index column match lookups because hash indexes always use the entire contents of the index column to calculate the hash value. For example, if A hash index is created on both data columns (A,B), the index cannot be used if only data column A is queried.
  • Hash indexes only support equal-comparison queries, including=,in().No scoped, fuzzy lookups are supported, for instance,where age > 20,where name like '%xc%'.
  • If there are a lot of hash conflicts, the storage engine must maintain linked lists, which can be very expensive to maintain, and the performance of the query will be low.

4. Full-text index

A full-text index is a special type of index that looks for keywords in text rather than comparing values in an index.

Full-text indexes match completely differently from other types of indexes, and there are many details that need to be paid attention to. More like what search engines do, rather than simple WHERE condition matching.

There is no conflict between creating a full-text index and a value-based B-tree index on the same column. Full-text indexes are suitable for full-text fuzzy searches, not normal WHERE conditions.

4. Index advantages

An index allows the MySQL server to quickly locate a table at a specified location, but this is not the only use of an index. As you have seen so far, indexes have some additional uses depending on the data structure in which they are created.

The most common b-tree index stores data in ORDER, so MySQL can use it to do ORDER BY and GROUP BY. Because the data is ordered, b-tree stores the related column values together. Finally, because the actual column values are stored in the index, some queries can complete the entire query using only the index. Based on this feature, indexes have the following advantages:

  • Indexes greatly reduce the amount of data that the MySQL server needs to scan. (Full table scan)
  • Indexes help the MySQL server avoid sorting and temporary tables.
  • Indexes can turn random I/ OS into sequential I/ OS.

Is an index the best solution?

Indexing is not always the best solution. In general, an index is only effective if the benefit of helping the storage engine find records quickly outweighs the extra work it does. For very small tables, a simple full table scan is more efficient in most cases. For medium to large tables, indexes are very effective. However, for very large tables, the cost of creating and using indexes increases. In this case, a technique is needed to directly distinguish the set of data required by the query, rather than record by record matching, such as table partitioning.

If the number of tables is very large, you can create a metadata information table to query for certain features that need to be used. For example, when executing queries that aggregate data from multiple applications distributed across multiple tables, you need to record metadata about which user information is stored in which table, so that you can ignore tables that do not contain specific user information. This is a common technique for large systems.

Reference article:

  1. www.jianshu.com/p/d67c63777…
  2. Cloud.tencent.com/developer/a…