preface

Based on the previous index of the InnoDB storage engine’s B+ tree, the following conclusions can be drawn:

  • Each index corresponds to a B+ tree, which is divided into several layers, with the bottom layer being leaf nodes and the rest being inner nodes. All user records are stored in the leaf node of the B+ tree, and all directory entry records are stored in the inner node.

  • The InnoDB storage engine automatically builds a cluster index for the primary key (it will automatically add it for us if it doesn’t exist). The leaf node of the cluster index contains the complete user record.

  • We can for your interested in establishing secondary indexes, secondary indexes of leaf node contains user record is composed of index column + primary key, so if you want to through the secondary indexes to find full user records, need to pass back to the table, namely after the primary key is found through a secondary indexes to clustering index finds the full user records.

  • Each node in the B+ tree forms a bidirectional linked list according to the order of index column value from small to large, and the records in each page (no matter the user record or directory entry record) form a single linked list according to the order of index column value from small to large. In the case of a joint index, pages and records are sorted by the column that precedes the joint index and, if the column has the same value, by the column that follows it.

  • Finding records by index starts at the root of the B+ tree and goes down level by level. Because each Page has a Page Directory based on the value of the index column, lookup in these pages is very fast.

Use of B+ tree indexes

First, the cost of indexing

  • The cost in space
    • For every index you build, you build a tree for itB+Trees, every one of themB+Each node of the tree is a data page, and a page is occupied by default16KBStorage space, a lot ofB+A tree consists of many data pages, which is a lot of storage space.
  • Time cost
    • Every time you add, delete, or modify data in a table, you need to modify each oneB+Index of the tree

2. Conditions for B+ tree index:

CREATE TABLE person_info(
    id INT NOT NULL auto_increment,
    name VARCHAR(100) NOT NULL,
    birthday DATE NOT NULL,
    phone_number CHAR(11) NOT NULL,
    country varchar(100) NOT NULL.PRIMARY KEY (id),
    KEY idx_name_birthday_phone_number (name, birthday, phone_number)
);
Copy the code
  • Build table
    • The primary key in the table is the ID column, which stores an automatically incrementing integer. So InnoDB storage engine automatically creates clustered indexes for ID columns.

    • We define an additional secondary index, IDx_NAMe_birthday_phone_number, which is a federated index consisting of three columns. Therefore, the user records stored in the leaf node of the B+ tree corresponding to this index only retain the values of name, birthday, phone_number and primary key ID, and do not store the values of country column.

  • According to the firstnameTo sort the values of the columns.
  • ifnameIf the values of the columns are the same, thebirthdayTo sort the values of the columns.
  • ifbirthdayIf the values of the columns are the same, thephone_numberSort by the value of.

(1) Full value matching

SELECT * FROM person_info WHERE name = 'Ashburn' AND birthday = '1990-09-27' AND phone_number = '15123983239';
Copy the code
  • The query process is as follows

    • Since the data pages and records in the B+ tree are first sorted by the value of the NAME column, it is possible to quickly locate the Ashburn record location for the name column.

    • Select Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn, Ashburn

    • If, unfortunately, the name and birthday columns have the same value, the records are sorted by the value of the phone_number column, so all three columns in the combined index could be used.

  • Does it matter if the order of the query after where is reversed

    • No impact, query optimization is automatically performed

(2) left-most matching

  • In fact, the search statement can not contain all the columns in the union index, just include the left column, for example, the following query statement:
SELECT * FROM person_info WHERE name = 'Ashburn';
Copy the code

Or contain more than one left column:

SELECT * FROM person_info WHERE name = 'Ashburn' AND birthday = '1990-09-27';
Copy the code
  • If we want to use as many columns as possible in the federated index, the columns in the search criteria must be consecutive columns from the leftmost in the federated index. For example, federated indexesidx_name_birthday_phone_numberThe columns in are defined in the ordername,birthday,phone_numberIf only one of our search criterianameandphone_numberAnd there’s nothing in betweenbirthdayFor example:
SELECT * FROM person_info WHERE name = 'Ashburn' AND phone_number = '15123983239';
Copy the code
  • So you can only usenameIndex of columns,birthdayandphone_numberI can’t use the index becausenameRecords with the same value are first followedbirthdayTo sort the values of,birthdayRecords with the same value are followedphone_numberValue to sort.

(3) the left-most prefix

  • Let’s say we want to query the name'As'The query can be written as follows:
SELECT * FROM person_info WHERE name LIKE 'As%';
Copy the code
  • Note, however, that if only the suffix or a string in the middle is given, for example:
SELECT * FROM person_info WHERE name LIKE '%As%';
Copy the code
  • MySQLI can’t quickly locate the record because I have it in the middle of the string'As'The string is not in order, so only the full table scan

(4) Matching range value

  • Looking back at usidx_name_birthday_phone_numberThe index ofB+Tree diagram, all records are sorted according to the index column value from small to large order, so this is very convenient for us to find the index column value within a certain range of records. Consider the following query statement:
SELECT * FROM person_info WHERE name > 'Asa' AND name < 'Barlow';
Copy the code
  • Since the data pages and records in the B+ tree are sorted by the name column first, the above query process actually looks like this:

    • Through B+ tree, find the first secondary index record whose name value is greater than Asa in leaf node, read the primary key value of the record for table back operation, obtain the corresponding cluster index record and send it to the client.

    • According to the previous step to find records, the records in the list look behind (in the same page linked records used singly linked list, data between pages with two-way linked list) under a secondary index record, the record judge whether the detector satisfies the name < ‘Barlow’ conditions, if meet, is sent to the client after back to the operating table.

    • Repeat the previous step until a secondary index record does not meet the condition name <‘Barlow’.

  • A B+ tree index can only be used if a range lookup is performed on the leftmost column of the index, for example:

SELECT * FROM person_info WHERE name > 'Asa' AND name < 'Barlow' AND birthday > '1980-01-01';
Copy the code
  • The above query can be divided into two parts:

    1. Through the conditionsname > 'Asa' AND name < 'Barlow' Come to the rightnameRange, search results may be multiplenameRecords with different values,
    2. For thesenameRecords with different values continue throughbirthday > '1980-01-01'Conditions continue filtering.
  • In this way, for the union index IDx_name_birthday_phone_number, only the name column is used, not the birthday column, because the birthday column can be used only if the name value is the same. The records in the query that range lookup by name may not be sorted by birthday column, so the B+ tree index is not needed to continue the birthday column lookup in the search criteria.

(5) Exactly match one column and range match another column

  • If the left column is an exact lookup, then the right column can do a range lookup, for example:
SELECT * FROM person_info WHERE name = 'Ashburn' AND birthday > '1980-01-01' AND birthday < '2000-12-31' AND phone_number > '15100000000';
Copy the code
  • The criteria for this query can be broken down into three parts:

    1. Name = ‘Ashburn’ to do a precise lookup on the name column, of course you can use the B+ tree index.

    2. Birthday > ‘1980-01-01’ AND birthday < ‘2000-12-31’, name = ‘2000-12-31’ They are then sorted by the value of birthday. So you can use the B+ tree index to do a range lookup on the Birthday column.

    3. Phone_number > ‘15100000000’, birthday may be different from birthday, so this condition cannot be used with B+ tree index, only traverses the previous query.

  • Similarly, the following queries may use the idx_name_birthday_phone_number joint index:

SELECT * FROM person_info WHERE name = 'Ashburn' AND birthday = '1980-01-01' AND phone_number > '15100000000';
Copy the code

(6) Used for sorting

  • In general, we can only change records are loaded into memory, then use some sorting algorithms, such as quick sort, merge sort, it’s sort, and so on in memory to sort these records, sometimes may be the query result set too big to rank in memory, may also temporarily with the aid of disk space to store the intermediate results, The sorted result set is returned to the client after the sorting operation is complete.
  • inMySQLIn, put this inSort in memory or on diskAre collectively referred to asFile sorting(English name:filesort), withfileThe word implies that these sorts are very slow (disk versus memory is like an airplane versus a snail). But if theORDER BYIf we use our index column in the clause, it is possible to eliminate sorting in memory or in a file, such as the following simple query:
SELECT * FROM person_info ORDER BY name, birthday, phone_number LIMIT 10;
Copy the code
  • The result set of this query needs to be followednameValue sort if recordednameIf the value is the same, follow thebirthdayTo sort, ifbirthdayIf the value of is the same as that ofphone_numberSorting. You can go back and look at what we builtidx_name_birthday_phone_numberA schematic of the index because of thisB+The tree index itself is sorted according to the above rules, so it extracts the data directly from the index and then proceedsBack to the tableThe operation pulls out columns that are not included in the index

Sorting considerations using federated indexes

  • Table phone_number (birthday, name, phone_number, birthday, name); table phone_number (birthday, name); table phone_number (birthday, name); table phone_number (birthday, name); table phone_number (birthday, name); The reason why this reverse order cannot be used is explained in more detail than is needed here.

  • Similarly, ORDER BY name, ORDER BY name, birthday matches the column to the left of the index and can use a partial B+ tree index. When the value of the left column of the joint index is constant, the following column can also be used to sort, for example:

SELECT * FROM person_info WHERE name = 'A' ORDER BY birthday, phone_number LIMIT 10;
Copy the code
  • This query can be sorted using the federated index becausenameThe value of the column is the same as that of the recordbirthday.phone_numberSorted. I’ve told you so many times.

Several cases in which sorting by index is not possible

ASC and DESC are used together

For scenarios that use a federated index to sort, we require that the sorting order of the columns be consistent, that is, either the columns are sorted by ASC or DESC.

If the name column is sorted in ascending order and the birthday column is sorted in descending order, for example:

SELECT * FROM person_info ORDER BY name, birthday DESC LIMIT 10;
Copy the code

So if you use index sort the process looks like this:

  • Start from the leftmost part of the indexnameThe smallest value of the column, and then findnameColumn is equal to all records of that value, and then fromnameThe rightmost record whose column is equal to that value starts looking for 10 records to the left.
  • ifnameThere are less than 10 records where the column is equal to the smallest value, so keep going to the rightnameRepeat the process for the second smallest value until 10 records are found.
A sorting sequence contains columns that are not of the same index

When multiple columns are not in the same index, they cannot be sorted using an index, for example:

SELECT * FROM person_info ORDER BY name, country LIMIT 10;
Copy the code

Name and country are not columns in a combined index, so they cannot be sorted using the index

The sorting uses complex expressions

To use an index to sort, you must ensure that the index column is a single column, not a decorated one, for example:

SELECT * FROM person_info ORDER BY UPPER(name) LIMIT 10;
Copy the code

Columns decorated with the UPPER function are not separate columns, so they cannot be sorted using indexes.

(7) Used for grouping

The following is a grouped query:

SELECT name, birthday, phone_number, COUNT(*) FROM person_info GROUP BY name, birthday, phone_number
Copy the code
  • This query does three grouping operations:

    1. First, the records are grouped according to the name value, and all the records with the same name value are divided into a group.

    2. Grouping all entries with the same name by birthday. Grouping all entries with the same birthday into a smaller group makes it look like there are several smaller groups in a larger group.

    3. The smaller groups generated in the previous step are then divided into smaller groups based on the value of phone_number, so the overall look looks like records are first divided into a large group, then the large group is divided into smaller groups, and then the smaller groups are divided into more smaller groups.

Cover index

An index pushdown

How to select the index

Create indexes only for columns used for searching, sorting, or grouping

That is, only columns that appear in the WHERE clause, join columns in the join clause, or columns that appear in the ORDER BY or GROUP BY clause are indexed. Columns that appear in the query list do not need to be indexed:

SELECT birthday, country FROM person_name WHERE name = 'Ashburn';
Copy the code

Columns such as birthday and country in the query list do not need to be indexed; we only need to create indexes for the name column that appears in the WHERE clause.

Consider the cardinality of the column

  • So the conclusion is that it is better to index columns with a large cardinality than columns with a small cardinality.
  • For example, the field of boys and girls does not make any sense

Index column types should be as small as possible

  • Take the integer type as an exampleTINYINT,MEDIUMINT,INT,BIGINTSo, they take up more and more storage space, and what we’re talking about hereThe type sizeRefers to the size of the data range represented by this type. The range of integers that can be represented is of course also increasing. If we want to index an integer column, we should try to keep the index column smaller if the range of integers that can be represented allows, for example, we can useINTDon’t use itBIGINT, can useMEDIUMINTDon’t use itINTThis is because:
    • The smaller the data type, the faster the comparison at query time (this is CPU level stuff)
    • The smaller the data type, the less storage space the index occupies, and the more records you can fit into a data page, thereby reducing the number of disksI/OThe performance penalty means that more data pages can be cached in memory, which speeds up read and write efficiency.

The prefix of the index string value

  • When we need to index the string column, that means there are two problems in the corresponding B+ tree:

    • B+A record in a tree index needs to store the full string for that column, and the longer the string, the more storage space it takes up in the index.
    • ifB+Index columns in tree indexes store long strings, which can take more time to do string comparisons.
  • So the index designer came up with a scheme to index only the first few characters of the string, that is, to keep only the first few characters of the string in the record of the secondary index

Effect of index column prefixes on sorting

If the index column prefix is used, for example, only the first 10 characters of the name column are placed in the secondary index, the following query might be a bit awkward:

SELECT * FROM person_info ORDER BY name LIMIT 10;
Copy the code

The secondary index does not contain the complete name column information, so it cannot sort the first ten characters of the same, the following characters are different.

Let the index column appear separately in the comparison expression

  • If the index column appears in a comparison expression not as a single column, but as an expression, or as a function call, the index is not needed.

Primary key insertion order

  • We know that for a useInnoDBIn the case of a storage engine table, the data in the table is actually stored in the table when we do not explicitly create an indexClustering indexOf the leaf nodes. And records are stored in the data page, page data and records are sorted in accordance with the order of the record the primary key value since the childhood, so if we insert the record’s primary key value is, in turn, increases, that each of us with a data page, changes to the next page data to insert, and if we insert the primary key of the big and small, it is more trouble, Suppose a data page stores records that are full and stores primary key values in1~100Between:

  • If you insert another primary key at this point, the value is zero9, then its insertion position is as follows:

  • Move some records from this page to the newly created page. What does page splitting and record shifting mean? Means: Performance loss! So if we want to avoid this unnecessary performance loss, it is best to increase the primary key values of the inserted records successively so that this performance loss does not occur. So we suggest: let the primary key haveAUTO_INCREMENTLet the storage engine generate the primary key for the table itself instead of manually inserting it, as we could define itperson_infoTable:
CREATE TABLE person_info(
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    birthday DATE NOT NULL,
    phone_number CHAR(11) NOT NULL,
    country varchar(100) NOT NULL.PRIMARY KEY (id),
    KEY idx_name_birthday_phone_number (name(10), birthday, phone_number)
);    
Copy the code
  • Our custom primary key column ID has the AUTO_INCREMENT attribute, and the storage engine will automatically fill in the increment primary key value for us when the record is inserted.

  • We know that the IDx_name_birthday_phone_number index provides a quick search for the NAME column, and creating a redundant index for the NAME column only adds to the maintenance cost and does not benefit the search.

  • Alternatively, we might index a column repeatedly, for example:

CREATE TABLE repeat_index_demo (
    c1 INT PRIMARY KEY,
    c2 INT.UNIQUE uidx_c1 (c1),
    INDEX idx_c1 (c1)
);  
Copy the code
  • And we see,c1The primary key is defined as a unique index, and it is defined as a normal index, but the primary key itself will generate clustered indexes, so the definition of unique index and normal index is duplicate, this situation should be avoided.

conclusion

  • This is just what we’re creating and usingB+Tree index process need to pay attention to some points, we will continue to introduce more optimization methods and matters needing attention, please look forward to. This episode is summarized as follows:
  1. B+ tree indexes have a cost in space and time.

  2. B+ tree indexes apply to the following situations:

    • All values match
    • Matches the column on the left
    • Matching range value
    • Matches exactly one column and ranges exactly the other
    • For sorting
    • Used for grouping
  3. Here are some things to consider when using indexes:

    • Create indexes only for columns used for searching, sorting, or grouping
    • Create indexes for columns with large cardinality of columns
    • Index column types should be as small as possible
    • You can index only the prefixes of string values
    • An index applies only if the index column appears alone in a comparison expression
    • In order to let as little as possibleClustering indexIn the case of page splitting and record shifting, it is recommended that the primary key be ownedAUTO_INCREMENTProperties.
    • Locate and delete duplicate and redundant indexes in a table
    • Try to useCover indexQuery and avoidBack to the tablePerformance loss.