The last article covered the basics of indexing, and this article continues with the actual practice of indexing optimization. Before getting into the field of index optimization, there are two important concepts related to indexes that are essential to index optimization.

The user table structure used to test this article:

Important concepts related to indexes

base

The number of unique keys in a single column is called the cardinality.

SELECT COUNT(DISTINCT name),COUNT(DISTINCT gender) FROM user;

The gender column has a cardinality of 2, indicating that there are many duplicate values in the gender column. The cardinality of name is equal to the total number of rows, indicating that there are no duplicate values in the name column, which is equivalent to the primary key.

Percentage of data returned:

The user table has 5 entries:

SELECT * FROM user;

Query the number of records whose gender is 0 (male) :

Then the proportion of records returned is:

Select * from SWJ where name = ‘SWJ’;

The ratio of records returned is:

SELECT * FROM user WHERE gender = 0; SELECT * FROM user WHERE gender = 0; SELECT * FROM user WHERE name = ‘swj’; Can they all hit the index?

SQL > alter table user;

SELECT * FROM user WHERE gender = 0; There are no matching indexes. Note that the filtered value is the percentage of return records we calculated above.

SELECT * FROM user WHERE name = ‘swj’; Matched index index_name, because you can directly find the records to be queried through the index, the value of filtered is 100

Conclusion:

Return 30% of the data in the table to go through the index, return more than 30% of the data in the full table scan. Of course, this conclusion is too absolute, and it is not an absolute 30%, but a approximate range.

Back to the table

When an index is created on a column, the index contains the key value of the column and the ROWID of the row that the key value corresponds to. The data in the table is called back by roWID access recorded in the index. Too many table return times seriously affects SQL performance. If the table return times are too many, perform full table scan instead of index scan.

Using Index in the EXPLAIN command result means that the table is not returned and the main data is retrieved through the Index. Using Where means you need to fetch data back from the table.

Index optimization

Sometimes the database has an index but is not selected for use by the optimizer.

SHOW STATUS LIKE ‘Handler_read%’; Check index usage:

Handler_read_key: If the index is working, the value of Handler_read_key will be high.

Handler_read_rnd_next: The number of requests in the data file to read the next row. If a large number of table scans are being performed, the value will be high, indicating poor index utilization.

Index optimization rules:

  1. If MySQL estimates that using an index is slower than a full table scan, it will not use the index

    The percentage of returned data is an important metric, and the lower the percentage, the easier it is to hit the index. Keep in mind that the range — 30% — is based on the fact that 30% or less of the data will be returned.

  2. The leading fuzzy query query could not match the index

    Create normal index for name column:

    Leading fuzzy query query cannot match index:

    EXPLAIN SELECT * FROM user WHERE name LIKE '%s%';

    Non-leading fuzzy query can use index, can be optimized to use non-leading fuzzy query:

    EXPLAIN SELECT * FROM user WHERE name LIKE 's%';

  3. The index will not be hit when the data type is implicitly converted, especially if the column type is string. Be sure to enclose character constant values in quotes

    EXPLAIN SELECT * FROM user WHERE name=1;

    EXPLAIN SELECT * FROM user WHERE name='1';

  4. In the case of composite indexes, the query condition does not contain the leftmost part of the index column (the leftmost part does not meet the rule), and the matching index is not matched

    Create index (name,age,status);

    ALTER TABLE user ADD INDEX index_name (name,age,status);

    Select * from user where id = 1;

    SHOW INDEX FROM user;

    The composite index index_name can be hit according to the leftmost rule:

    EXPLAIN SELECT * FROM user WHERE name='swj' AND status=1;

    Note that the leftmost rule does not say the order of the query conditions:

    EXPLAIN SELECT * FROM user WHERE status=1 AND name='swj';

    The query condition contains the leftmost column of the index:

    EXPLAIN SELECT * FROM user WHERE status=2 ;

  5. Union, IN, or can all match the index. In is recommended.

    union:

    EXPLAIN SELECT * FROM user WHERE status = 1

    UNION ALL

    SELECT * FROM user WHERE status = 2;

    in:

    EXPLAIN SELECT * FROM user WHERE status IN (1,2);

    or:

    EXPLAIN SELECT * FROM user WHERE status=1 OR status=2;

    Query CPU consumption: or > in >union

  6. A condition separated by or. If the condition before or has an index and the column after it has no index, the index involved will not be used

    EXPLAIN SELECT * FROM payment WHERE customer_id = 203 OR amount = 3.96;

    Since there is no index in the condition column after or, subsequent queries must perform a full table scan. In the case of a full table scan, there is no need for an additional index scan to increase IO access.

  7. Negative conditional queries cannot use indexes and can be optimized for IN queries.

    The negative conditions are:! =, <>, not in, not exists, not like, etc.

    Create index for status column:

    ALTER TABLE user ADD INDEX index_status (status);

    Select * from user where id = 1;

    SHOW INDEX FROM user;

    Negative condition does not hit cache:

    EXPLAIN SELECT * FROM user WHERE status ! =1 AND status ! = 2;

    It can be optimized as an IN query, but only if the distinction is high and the percentage of returned data is less than 30% :

    EXPLAIN SELECT * FROM user WHERE status IN (0,3,4);

  8. Range conditional queries can hit indexes

    Range conditions include <, <=, >, >=, and between

    Create index (status,age);

    ALTER TABLE user ADD INDEX index_status (status);

    ALTER TABLE user ADD INDEX index_age (age);

    Select * from user where id = 1;

    SHOW INDEX FROM user;

    A range conditional query can hit an index:

    EXPLAIN SELECT * FROM user WHERE status>5;

    A range column can use an index (the union index must be the left-most prefix), but the column following the range column cannot use the index. The index can be used for a maximum of one range column, and cannot be used for all of two range columns in a query condition:

    EXPLAIN SELECT * FROM user WHERE status>5 AND age<24;

    If a range query and an equivalent query exist at the same time, the index of the equivalent query column will be matched first:

    EXPLAIN SELECT * FROM user WHERE status>5 AND age=24;

  9. Database computations do not match indexes

    EXPLAIN SELECT * FROM user WHERE age > 24;

    EXPLAIN SELECT * FROM user WHERE age+1 > 24;

    The calculation logic should be handled in the business layer as much as possible to save the CPU of the database while maximizing index hits.

  10. Use overwrite indexes for queries to avoid back to the table

The queried column data can be obtained from the index instead of using the row locator to obtain data from the row. That is, the column to be queried must be overwritten by the created index. This accelerates the query speed.

SQL > alter table user;

Since the status field is an index column, we can get the value directly from the index without going back to the table:

Using Index means to query from an Index

EXPLAIN SELECT status FROM user where status=1;

When querying other columns, you need to query back to the table, which is one of the reasons to avoid SELECT * :

EXPLAIN SELECT * FROM user where status=1;

  1. Columns that are being indexed cannot be null

    Single-column indexes do not store NULL values, and compound indexes do not store all NULL values. If columns are allowed to be null, you may get “not expected” result sets, so use the not NULL constraint and default values.

    Create index for remark column:

    ALTER TABLE user ADD INDEX index_remark (remark);

    IS NULL can hit an index:

    EXPLAIN SELECT * FROM user WHERE remark IS NULL;

    IS NOT NULL Cannot match index:

    EXPLAIN SELECT * FROM user WHERE remark IS NOT NULL;

    Although IS NULL can hit an index, NULL itself IS NOT good database design and should use the NOT NULL constraint and default values

  2. Indexes should not be created on fields that are frequently updated

    Because the update operation changes the B+ tree and rebuilds the index. This process is very costly to database performance.

  3. Indexes should not be created on fields with little differentiation

    For fields like gender, which are not very differentiated, it makes little sense to build indexes. Performance is comparable to full table scan because data cannot be filtered effectively. In addition, the optimizer will not choose to use indexes if the percentage of data returned is more than 30%.

  4. Business fields with unique characteristics, even if the combination of multiple fields, must be built into a unique index

    Although unique indexes affect insert speed, the speed increase for queries is significant. In addition, even with very good validation control at the application layer, as long as there is no unique index, there will still be dirty data generated in concurrent situations.

  5. When multiple tables are associated, ensure that the associated fields have indexes

  6. Avoid the following misconceptions when creating indexes

    • The more indexes, the better. Consider that a query requires an index.

    • Indexes consume space and significantly slow down updates and additions.

    • Resist unique indexes and believe that business uniqueness should be solved by “search and plug later” at the application layer.

    • Optimize too early and start optimizing without knowing the system.

conclusion

For their own SQL query statements, try to use EXPLAIN command analysis, do a programmer to the pursuit of SQL performance. SQL ability is an important indicator to measure whether a programmer is reliable or not. As a back-end programmer, I think so.

reference

  • MySQL in Simple Form


                                                —–END—–

If you like this article, please scan the picture below and watch more exciting content