This article is adapted from the MySQL engine architecture and performance optimization at https://url.wx-coder.cn/IF5HH, and is stated in the reference documentation at Awesome MySQL List https://parg.co/htL.

The principle and application of MySQL index: index type, storage structure and lock

In the section of data Structure and Algorithms — index https://url.wx-coder.cn/O07eI, we discuss file indexes such as B+Tree, LSM-Tree and the basic algorithms of full-text indexes. This article will discuss the practical application of file indexes in relational databases.

Index is a data structure that helps the database system obtain data efficiently, while database Index is essentially a data structure that increases the efficiency of data retrieval in the database at the cost of additional write operations and storage space for maintaining Index data structure. Indexes help us locate data quickly without having to traverse every row in the database every time we search. Of course, the more and longer the index is, the better, because in addition to taking up space, the index has additional operations to update the index for subsequent database additions, deletions, and modifications. Generally speaking, full table scan is faster for small tables, and indexes are used for large tables, while indexes for super large tables are basically invalid. We may need to rely on independent full-text indexing systems. MySQL’s own full-text index can only be used for InnoDB and MyISAM, and can only search English full-text. Full-text index engines such as ES and Solr are generally used.

The index type

From the realization of index, we can divide it into clustered index and non-clustered index, or called auxiliary index or secondary index, these two categories; From the actual application of index, it can be subdivided into common index, unique index, primary key index, joint index, foreign key index and full-text index.

InnoDB can be thought of as a clustered index because the leaves of its B+ tree contain complete data records. InnoDB data file itself is an index file, and table data file itself is an index structure organized by B+Tree. The data field of the leaf node of this Tree keeps complete data records. The key of this index is the primary key of the table, so the InnoDB table data file itself is the primary index. InnoDB’s secondary index data field stores the value of the primary key of the corresponding record rather than the address. In other words, all InnoDB secondary indexes refer to the primary key as the data field.

However, the leaf node of MyISAM B+ tree only stores the address of data, so it is called non-clustered index. MyISAM engine uses B+Tree as the index structure, and the data field of the leaf node stores the address of the data record. In MyISAM, there is no difference in structure between primary and Secondary keys, except that the primary index requires a unique key, while Secondary keys can be repeated.

In InnoDB, there are clustered indexes and common indexes. Clustered indexes are constructed according to the primary key. Leaf nodes store the row of records corresponding to the primary key. While the common index is constructed according to the column when the index is declared, the leaf node stores the value of the primary key corresponding to this row of records. According to the common index query, it is necessary to find the value of the corresponding primary key in the common index first, and then search for records in the cluster index according to the primary key value, commonly known as the back table. If we query a whole row, we must look it up in the clustered index. If we only need to query the values of the primary key according to the common index, since these values already exist in the common index, we do not need to return to the table. This is called index coverage, which can improve the query efficiency to some extent.

There are two special cases in common indexes: unique index and joint index. When inserting and modifying a unique index, it checks whether the value of the corresponding column of the index already exists.

Data line is not the smallest unit of storage storage engine management, index can only help us to locate to a data page, the smallest unit of each disk, speaking, reading and writing to is also a data page, and store multiple data lines within a data page, we need to understand the internal structure of the data page to know how to locate the storage engine to a data line, See MySQL Storage Management https://url.wx-coder.cn/IF5HH series.

Index selectivity

Selectivity is defined as the ratio of non-repeating index values to the total number of records in the data. The higher Selectivity is, the more efficient the index will be. For example, it is meaningless to index a parameter such as gender.

Index Selectivity = Cardinality / #T
Copy the code

Obviously, the selectivity range is (0, 1), and the higher the selectivity, the greater the index value, which is determined by the nature of B+Tree. In a real database, we could calculate the selectivity of a column with the following statement:

SELECT count(DISTINCT(title))/count(*) AS Selectivity FROM titles;
Copy the code

A primary key

In InnoDB, table data is arranged and distributed to optimize quick primary key queries. The lookup speed is the fastest. The logical order of the key values in the index determines the physical order of the corresponding rows in the table. It is recommended to use an auto-growing integer primary key (proxy key) even if there are no columns in the table that are suitable for the primary key, so that the table is stored sequentially as data is added to the table, and will be optimized later when other tables reference the foreign key query.

If a Primary Key is not explicitly defined when creating a table, InnoDB storage engine selects or creates a Primary Key as follows:

  • If there is a Unique NOT NULL index in the table, then that column is the primary key.
  • If the above criteria are not met, the InnoDB storage engine automatically creates a 6-byte pointer that the user cannot view or access.

Primary key selection

In distributed ID https://url.wx-coder.cn/tQ5eH we discussed the strategy for selecting distributed ids in distributed scenarios, and we will consider the same in databases. MySQL recommends that primary keys be as short as possible. A UUID of 36 characters is not acceptable. If the primary key is a long string and many normal indexes are built, the normal indexes take up a lot of physical space. In addition, it is better to increase the primary keys sequentially, otherwise in the InnoDB engine, the disorder of THE UUID may cause the data location to change frequently, which seriously affects the performance.

When inserting a self-increasing ID, the two adjacent records may reside in the same data block. However, the sequential design of a service, such as an order number, may not be as good as the self-increasing ID. As a result, the continuous insertion may occur in multiple data blocks, increasing the number of disk reads and writes.

  • Uniqueness: The increment ID is vulnerable to brute force cracking, and data migration, especially table merges, will inevitably cause conflicts. UUID guarantees uniqueness and completely avoids conflicts.
  • Key length: The length of the increment field is much smaller than the UUID, which can have a significant impact on retrieval performance. Innodb engine also finds records based on primary keys and indexes when retrieving data. In this way, the read performance is better when the primary key length is short.
  • Concurrency: In the case of auto-increment ids and high concurrency, competing auto-increment locks reduce database throughput. UUID can be generated at the application layer to improve the throughput of the database.
  • Database index: InnoDB tables are stored in primary key order. If random I/OS occur during data writing, disk blocks will be moved frequently. When there is a large amount of data, the shortcomings of writing become obvious. The newly added data in the self-added ID can be sorted by default, which greatly improves performance. For UUID, there is no order between primary keys.

Primary key and unique index

A primary key is a unique index, but a unique index does not have to be a primary key. A unique index can be empty, but there can only be one empty value. A primary key cannot be empty. For a single-column index, all data in the column must be different, but NULL values are allowed. For a joint index with multiple columns, the combination of these columns is required to be unique. The unique index itself can be used as an index, and in practice can also be used to generate data constraints to prevent the same data from being added or modified, so as to ensure data integrity.

For string types, you can specify the index prefix length (and it is required for BLOB/TEXT), which is up to 767 bytes in the InnoDB table, and the parameter M is measured in bytes. If a string is too long, creating a B+Tree index is wasteful. In this case, manually simulating a HASH index is an option. However, this method cannot flexibly query strings using prefix (such as LIKE operations).

Joint index

A single-column index is an index created for a field in a table. Generally, it is more efficient to create an index using an integer or a small fixed-length string. A federated index is an index in which multiple fields are organized in a certain order. For example, indexes (name, city, gender) are first organized in the order of name field. When the value of name field is the same (for example, Bush), they are organized in the order of City field. When the value of city field is the same, they are organized in the order of gender field. Since the index is built on multiple columns, we can sometimes add columns that need to be queried frequently to the index. For example, we often need to find age by name. We can create a name and age joint index.

Common condition unions include WHERE and ORDER BY. WHERE condition federation means that for equivalent conditions in a WHERE condition, the fields are used in the same order as the fields in the federated index.

If the ORDER BY column is one of the columns in the join index that overwrites the WHERE condition, MySQL will read the ordered data directly from the index, and then organize the data in that ORDER after reading the data from disk. This reduces the operation of sorting disk data. For queries that do not overwrite ORDER BY, there is a Creating sort index, which is the most time-consuming to sort disk data. For a query that overrides ORDER BY, there is no need for sorting, and the time is mainly reflected in the process of pulling data from disk.

The prefix index

MySQL prefix indexes can be divided into three categories: syndication prefix, like prefix, and string prefix.

Associative index Prefix and Leftmost Prefix match

The prefix of a combined index indicates that all or part of the index columns must be used from left to right when creating a multi-column index. For example: (COL1, col2, col3) It is valid for (col1), (col1, col2), and (col1, col2, col3). The query will continue to match to the right until a range query (>,<,BETWEEN,LIKE) is encountered, and the index column after that will not use the index to optimize the lookup.

Where name=’Bush’; where name=’Bush’; where name=’Bush’; Then, it only needs to locate the value of the first Bush in the name field according to the B+ tree, and then scan the subsequent data sequentially until the first data that is not Bush is found. In the scanning process, the data ID of the index slice is recorded, and finally the cluster index is queried according to the ID to obtain the result set. Where name=’Bush’ and city=’Chicago’; MySQL can directly locate the index slice in the gray part in the middle according to the joint index, then obtain the data ID of the index slice, and finally query the clustered index according to the ID to obtain the result set.

Therefore, we can draw the attention of the joint index prefix:

  • Cannot use a federated index across fields, as inwhere name='Bush' and interest='baseball';For this query, the name field can use the first field of the joint index to filter most of the data, but for the interest field, it cannot directly locate the index piece data of the third field through the characteristics of B+ tree. For example, baseball here may be scattered among the second and seventh data. Finally, the interest field actually does an overwrite index scan.
  • For non-equivalent conditions, such as >, <,! = and so on, the joint index prefix can only filter the index slice until the first field using non-equivalent conditions, and the subsequent fields cannot participate in the filtering of the index slice even in the joint index. Here such aswhere name='Bush' and city>'Chicago' and interest='baseball';If the query condition is name, filter the non-bush data from the first field in the index, and then locate the Chicago position in the index by the second field in the joint index. Because the interest field may be scattered in any position of the third field in the index, the third field cannot participate in the index slice filtering.

Therefore, the column order of b-tree is very important. The above rules are related to the column order. For actual applications, indexes with different columns and their sequence should be created according to specific requirements. Index(A,B,C):

# use indexA>5 AND A<10 - prefix A=5 AND B>6 - prefix A=5 AND B=6 AND C=7 - prefix A=5 AND B IN (2,3) AND C>5 - prefix A=5, fill the holeIndex cannot be usedB>5 - does not contain the left-most prefix B=6 AND C=7 - does not contain the left-most prefix# Use partial indexesSelect * from A where A=5 AND B=2 AND B>6 AND C=2Copy the code

To sort the results using an index, the index must be in the same ORDER as in the ORDER BY clause, and all columns must be in the same descending ORDER (ASC/DESC). If the query joins multiple tables, it can only do so if the column in ORDER BY refers to the first table (ORDER JOIN required).

Use index sortORDER BY A - The left-most prefix matches WHERE A=5 ORDER BY B,C - the left-most prefix matches WHERE A=5 ORDER BY B - the left-most prefix matches WHERE A>5 ORDER BY A,B - the left-most prefix matchesCan't use index sortWHERE A=5 ORDER BY B DESC,C ASC - WHERE A=5 ORDER BY B,D - WHERE A=5 ORDER BY C - WHERE A>5 Select A from B WHERE A=5 AND B IN(1, 2) ORDER BY C - B WHERE B =5 AND B IN(1, 2) ORDER BY C - BCopy the code

Like the prefix

For the like prefix, this is if the expression first_name like ‘rMq%’ is used in a like query; It is possible to use the index of the first_name field. But first_name like ‘%Chu%’; , it cannot use the index of first_name. First_name like ‘rMq%’; first_name like ‘rMq%’; , MySQL will complete it into two data: rMqAAAAA and rMqzzzzz. The length of the later completion part is the maximum length of the current field. When using index queries, MySQL uses these two data points for index positioning, and the final result set required is the data in the middle of the two registration points. Here is a schematic of using the like prefix:

String prefix

A string prefix index is an index that takes only the first few characters of a string. In the process of query, if a field value is long, it will be very expensive to build an index for it, and the query efficiency is also low. String prefix index exists to solve this problem. String prefix indexes are used in two main ways:

  • The selectivity of the field prefix part is relatively high;
  • The field as a whole is not very selective (hash indexes can be used if the field as a whole is very selective).

If first_name=’qWhNIZqxcbD’; if first_name=’qWhNIZqxcbD’; , MySQL will first cut the first four characters of the equivalent condition, and then compare it with the string prefix index to locate the index with the prefix “qWhN”. Then it will get the disk data corresponding to the index, and finally compare the first_name field of the obtained disk data with the value of the query equivalent condition. So I get my result set.

One of the most important problems in string prefix indexing is how to choose the length of the prefix. When the length is chosen properly, the filter of the prefix index is almost equal to the selectivity of the whole field index. Here we need to use the concept of field selectivity explained above, that is, field selectivity is the ratio of the data amount of the group with the largest data amount to the total data amount after the field is grouped. The prefix selection can be interpreted as the ratio of the group with the largest amount of data to the total amount of data after the group is grouped by prefix. The SQL formula for calculating prefix length is shown in the following table:

select count(*) as cnt, first_name as perf from actor group by perf ORDER BY cnt desc limit 10;	- 0
select count(*) as cnt, left(first_name, 2) as perf from actor group by perf ORDER BY cnt desc limit 10;	2 -
select count(*) as cnt, left(first_name, 3) as perf from actor group by perf ORDER BY cnt desc limit 10;	- 3
select count(*) as cnt, left(first_name, 4) as perf from actor group by perf ORDER BY cnt desc limit 10;	4 -
Copy the code

Other index

Cover index

An overwrite index is an index that adds all fields used in a query to the tail of the index used by the query except those that participate in the index filtering scan. The advantage of overwriting index scan is that all the fields used in the query are in the same index field. Therefore, you only need to obtain the relevant data from the index during the query, and do not need to scan the corresponding data back to the disk. In this way, the most time-consuming disk I/O reading in the query is avoided. For the following query:

select a, b, c from t where a='a' and b='b';
Copy the code

If the joint index (A, B, C) is established in this query, it is the index that uses the cover scan, because for this query, the first two fields a and B of the index can be used to filter the index slices according to the WHERE condition, and the values of the three fields A, B, and C can be read directly in the index after filtering the index slices. No need to go back to the table scan.

Samsung index

A three-star index refers to a query that sets up three conditions that meet the general index conditions. Each condition that meets the established index for a specific query means that the index gets one star. When the index gets three stars, it means that the index is a three-star index for the query. A three-star index is the optimal index for a specific query. The conditions for establishing a three-star index are as follows:

  • Retrieves the columns of all equivalent predicates(WHERE COL =...).The column at the beginning of the index;
  • Add the column in ORDER BY to the index;
  • Add the remaining columns in the query statement to the index, and put the easy-to-become columns last to reduce update costs.

For example, for the following query, the index (first_name, last_name, email) is a three-star index:

SELECT first_name, last_name, email FROM user WHERE first_name = 'aa' ORDER BY last_name;
Copy the code

The following rules can be found in the creation process of the Samsung index:

  • Overwriting equivalent predicate conditions, such as first_name, can filter most of the index slice data;
  • Overwriting the ORDER by field avoids sorting the result set, such as last_name;
  • Overwriting the remaining fields avoids reading data back to disk, that is, using an overwrite index scan such as email.

Index storage structure

If the data page is in the buffer pool, it is returned directly. If the data page is not in the buffer pool, it is read from the clustered index through disk I/O and put into the buffer pool. A data page can contain multiple rows of data. The cache pool manages the data pages through the LRU algorithm, so that the most frequently used data pages are placed at the front of the list, the less frequently used ones are placed at the back of the queue, and when the buffer pool is full, the data pages at the back of the queue are eliminated. Pages of newly read data from disk are not placed at the head of the queue but in the middle of the queue, which can be modified with parameters. Buffer pools can also be set to multiple instances, and the data page determines which buffer pool to put according to the hash algorithm.

In the MySQL Storage Structure article, we discussed the storage structure of MySQL data pages.

The Memory Architecture | Memory Architecture

InnoDB memory consists of a buffer pool, a redo log buffer, and additional memory pools, as shown in the following figure:

The buffer pool accounts for the largest chunk of memory, which is used to cache their respective data. Data files are read into the buffer pool by page (16K per page), and cached data is retained according to the least Recently used algorithm (LRU). The buffer pool buffers data types such as data pages, index pages, insert buffers, adaptive hash indexes, lock information, and data dictionary information, among which data pages and index pages account for most of the memory. The log buffer puts redo log information into this buffer and then flusher it to the redo log file at a certain frequency (1s by default).

InnoDB uses a series of background threads to process operations asynchronously and buffer pools to reduce the difference in CPU and disk speeds. When querying, it will first locate the corresponding data page through the index, and then detect whether the data page is in the buffer pool. If so, it will return directly. If not, it will read the corresponding data page from the clustered index through disk IO and put it into the buffer pool. A data page can contain multiple rows of data. The cache pool manages the data pages through the LRU algorithm, so that the most frequently used data pages are placed at the front of the list, the less frequently used ones are placed at the back of the queue, and when the buffer pool is full, the data pages at the back of the queue are eliminated. Pages of newly read data from disk are not placed at the head of the queue but in the middle of the queue, which can be modified with parameters. Buffer pools can also be set to multiple instances, and the data page determines which buffer pool to put according to the hash algorithm.

Storage Architecture | Storage structure

The logical storage structure of InnoDB storage engine is roughly the same as that of Oracle. All data is stored logically in a space called tablespace. A tablespace consists of segments, extents, and pages. Pages are sometimes called blocks in some documents. 1 extent = 64 pages. The logical storage structure of InnoDB storage engine is roughly shown in the figure below:

Innodb_file_per_table = innodb_file_per_table = innodb_file_per_table = innodb_file_per_table = innodb_file_per_table = innodb_file_per_table = innodb_file_per_table = innodb_file_per_table

Table Spaces are made up of segments. InnoDB’s storage engine is organized by indexes, where leaves are used to record data and stored in the data segment, while non-leaves are used to build indexes and stored in the index segment. An area is made up of consecutive pages. In any case an area is 1MB. There can be multiple pages in an area, each page is 16KB by default, so by default an area can contain 64 consecutive pages. A line of records is eventually stored in a file in binary form.

Physically, InnoDB tables consist of shared table Spaces, log file groups (more specifically, Redo file groups), and table structure definition files. If innodb_file_per_TABLE is set to ON, each table will have a separate tablespace file ending in IBD. Data, indexes, and internal data dictionary information for the table will be stored in this separate tablespace file. The table structure definition file ends in FRM. This is independent of the storage engine. The table structure definition file is the.frm file for any storage engine.

The Process Architecture | Process Architecture

By default, InnoDB has 7 background threads, including 4 IO threads, 1 Master thread, 1 Lock monitor thread, and 1 Error monitor thread. Most of InnoDB’s work is done in a single Master thread. The Master thread has the highest priority and is divided into the following loops: main loop, background loop, Flush loop, and suspend loop.

The pseudo-code for the main loop is as follows:

void master_thread(a) (
    loop:
    for (int i =0; i <10; i++){
        do thing once per second
        sleep 1 second if necessary
    }
    do things once per ten seconds
    goto loop;
}
Copy the code
  • The one-per-second operations include flushing the log buffer (always), merging the insert buffer (possible), flushing up to 100 dirty data pages (possible), and switching to background loop if there is no current user activity (possible).
  • These operations every 10 seconds include merging up to 5 insert buffers (always), flushing log buffers (always), flushing 100 or 10 dirty pages to disk (always), generating a checkpoint (always), and deleting useless Undo pages (always).
  • Background loop, if there is currently no user activity or the database is closed, switches to this loop to perform the following operations: delete useless undo pages (always), merge 20 insert buffers (always), jump back to the main loop (always), flush 100 pages until it is eligible to jump to Flush Loop (possibly).
  • If there is nothing else to do in flush loop, switch to suspend loop and suspend the master thread.

The index with the lock

MySQL provides three levels of lock: row lock, table lock and page lock. Table lock costs less and locks faster. No deadlocks occur; The locking force is large, the probability of lock conflict is high, and the concurrency is low. The overhead of row lock is large and the lock is slow. Deadlocks occur; The lock granularity is small, the probability of lock conflict is low, and the concurrency is high. The cost and speed of page lock are between table lock and row lock. Deadlocks occur; The locking granularity is between table and row locks, and the concurrency is average. Each storage engine can have its own locking policy, for example MyISAM only supports table level locking, while InnoDB supports row level locking as well as table level locking (default).

Row locks Table locks Page locks
MyISAM Square root
BDB Square root Square root
InnoDB Square root Square root

InnoDB locks rows by locking index entries, unlike Oracle, which locks rows in data blocks. InnoDB’s row-locking implementation means that InnoDB only uses row-level locks if data is retrieved by index criteria. Otherwise, InnoDB will use table locks. Similarly, when a record for update does not exist, the entire table will be locked. When a table has multiple indexes, different transactions can use different indexes to lock different rows. In addition, InnoDB uses row locks to lock data whether using primary key indexes, unique indexes, or normal indexes.

InnoDB locks all scanned records and range queries with gap locks. Then the lock process is implemented according to two-stage lock 2PL, that is, lock first and all locks are released when the transaction is committed. The locking strategy depends on the isolation level of the database. In the case of the default repeatable read isolation level, the locking process also depends on whether the query condition contains an index, whether it is a primary key index or a normal index, and whether it is a unique index.

Select * from o_order WHERE order_sn = ‘201912102322’ for update; SQL > select * from index where lock policy is different;

  • Order_sn is the primary key index, in which case an exclusive lock will be placed on the record order_sn = 201912102322 on the primary key index.

  • Order_sn is a normal index and is a unique index that will be locked exclusively on the corresponding record on the normal index and on the corresponding record on the primary key index.

  • Order_sn is a normal index, and is not the only index. It will lock one or more records on order_sn = 201912102322 on the normal index, and lock the records on the corresponding primary key index. In order to prevent other transactions from inserting order_sn = 201912102322, if the index is unique, there is no need for a gap lock, but a row lock will do.

  • There is no index on order_sn, innoDB will scan all tables on the primary key index. Instead of adding a table lock, innoDB will place row-level exclusive locks on all records. In fact, innoDB is optimized to release the lock when it finds a row that does not match. Of course this violates the 2PL principle and is released when a transaction commits. In addition to locking records, the gaps between every two records are also locked, so all gap locks and rows of order_sn = 201912102322 will eventually be saved.

  • If order_sn = 201912102322 does not exist, a gap lock will be added if order_sn is the primary key index. This gap is the first record in the primary key index from order_sn less than 201912102322 to greater than 201912102322. Imagine if there is no gap lock, if something else inserts a record order_sn = 201912102322, since select for UPDATE is currently read, even if the item above is not committed, if the query is re-queried in that item, phantom reading will occur.

  • If there is no index, all scanned records and gaps are locked, and if the row locks do not match, only gaps are released. Recall from the data page results above that the other maximum and minimum records, Infimum and Supremum Records, are used when a gap lock is applied.

read

This article does not cover index optimization in MySQL. You can refer to the MySQL Engine Architecture and Performance optimization section in https://url.wx-coder.cn/IF5HH.