The optimization of MySQL is mainly divided into Scheme optimization and Query optimization.

The high performance index strategy discussed in this paper mainly belongs to the category of structural optimization. The content of this article is entirely based on the theoretical foundation above, but in fact, once the mechanism behind indexing is understood, the choice of high-performance strategies becomes pure reasoning, and the logic behind those strategies can be understood.

A sample database

To discuss indexing strategies, you need an example of a database with a relatively small amount of data. This article uses one of the sample databases provided in the official MySQL documentation: Employees. This database relationship is moderately complex and has a large amount of data. Here is the e-R diagram of the database (quoted from the MySQL official manual) :

Second, the most left prefix principle and related optimization

The first condition for efficient use of indexes is to know which queries will use the index. This problem is related to the “left-most prefix principle” in B+Tree. The following example illustrates the left-most prefix principle.

Here’s the concept of a federated index. In the previous article, we assumed that an index refers to a single column. In fact, MySQL indexes can refer to multiple columns in a certain order, which is called a federated index.

In general, a joint index is an ordered tuple < A1, a2… An >, where each element is a column of the data table, and actually to define the index rigorously you need to use relational algebra, but I don’t want to talk too much about relational algebra here, because that would be boring, so I’m not going to do it rigorously here. In addition, a single-column index can be regarded as a special case where the number of elements in a joint index is 1.

Using employees.titles as an example, let’s take a look at what indexes are on it:

Third, the EXPLAIN

In our daily work, we will sometimes slow down the query to record some SQL statements that take a long time to execute. Finding these SQL statements does not mean that we are done. Sometimes we often use the explain command to view the execution plan of these SQL statements and check whether the SQL statement has used the index. Whether or not a full table scan has been done can be checked with the explain command.

So we looked into MySQL’s overhead optimizer and got a lot of details about the access policies that might be taken into account by the optimizer and which policies are expected to be adopted by the optimizer when running SQL statements.

The information out of EXPLAIN has 10 columns, including ID, select_TYPE, table, type, possible_keys, key, key_len, ref, rows, and Extra

Brief Description:

  • Id: Select the identifier
  • Select_type: indicates the query type.
  • Table: output result set table
  • Type: indicates the connection type of the table

All (full table scan), index (full table scan), range (index scan), req (search condition column uses index and does not have primary key and unique, Ref_eq (lookup using primary key or unique index), const (place primary key after WHERE as conditional query, mysql optimizer can optimize this query to a constant)

  • Possible_keys: possible index during query
  • Key: indicates the actual index
  • Key_len: length of the index field
  • Ref: comparison of columns to indexes
  • Rows: number of rows scanned (estimated number of rows)
  • Extra: Description and description of execution

Iv. Specific content

Case 1: All columns match

explain SELECT * FROM employees.titles WHERE emp_no='10001' AND title = 'Senior Engineer' AND from_date='1986-06-26';

Obviously, an index can be used when an exact match (meaning “=” or “IN” match) is performed against all columns IN the index. One thing to note here is that indexes are theoretically order sensitive, but since MySQL’s query optimizer automatically adjusts the condition order of the WHERE clause to use the appropriate index, for example if we reverse the condition order in where, the effect is the same.

Case two: The leftmost prefix matches

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001';

When a query condition matches exactly one or more consecutive columns to the left of the index, such as or <emp_no, title>, it can be used, but only in part, the left-most prefix of the condition. The above query uses the PRIMARY index, but key_len is 4, indicating that only the first column prefix of the index is used.

Case 3: The query criteria use an exact match for the columns in the index, but one of the intermediate criteria is not provided

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-0626';

Select from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date, from_date So the result needs to be scanned to filter froM_DATE (there is no scan here because EMP_NO is unique).

If you want from_date to also use indexes instead of where filtering, you can add a secondary index < emp_NO, from_date> that will be used in the above query. In addition, you can use an optimization called “quarantine column” to fill in the “pit” between EMP_NO and from_date.

First let’s take a look at the different values of title:

There are only seven. IN cases where the number of columns that are pits is small, consider filling the pit with “IN” to form the left-most prefix:

This time key_len is 56, indicating that the index is fully used, but from the type and rows we can see that IN actually performs a range query, where seven keys are checked.

Performance improved a bit after “pit filling”. If there is a lot of data left over after emp_NO filtering, the latter performance advantage becomes even more pronounced. Of course, if the title value is too large, pit padding is not appropriate and a secondary index must be created.

Case 4: The query condition does not specify the first index column

Since it is not the left-most prefix, a query like index obviously does not use an index.

Case 5: Matches the prefix string of a column

You can use an index at this point, but you cannot use an index if the wildcard does not appear only at the end. If the wildcard % character does not appear at the beginning, you can use the index, but only one of the prefixes may be used depending on the situation.

Case 6: Range query

A range column can use an index (it must be the left-most prefix), but the column after the range column cannot use an index. At the same time, the index can be used for at most one range column, so if there are two range columns in the query criteria, the index cannot be used for all of them.

You can see that the index does nothing for the second range index. One interesting aspect of MySQL in particular is that it may not be possible to distinguish range indexes from multi-value matches using explain alone, because both are displayed as range in Type. Also, using “between” does not mean a range query, as in the following query:

It looks like two range queries are being used, but “BETWEEN” on EMP_no is actually equivalent to “IN”, which means emp_no is actually a multi-value exact match. You can see that the query uses all three columns of the index. So be careful to distinguish between multi-value matching and range matching in MySQL, otherwise you will get confused about MySQL’s behavior.

Case 7, index selectivity and prefix index

Since indexes can speed up queries, should we build indexes whenever a query statement needs them? The answer is no. Because indexes speed up queries, they also come at a cost: index files themselves consume storage space, and indexes increase the burden of inserting, deleting, and modifying records. In addition, MySQL also consumes resources to maintain indexes at runtime, so more indexes are not always better. In general, indexes are not recommended in two cases.

The first case is that the table records are relatively small, such as 1000 or even only a few hundred records of the table, there is no need to build an index, let the query do the full table scan. As for how many records is too many, I have my own opinion. My personal experience is to take 2000 as the dividing line. If the number of records does not exceed 2000, we can consider not building indexes, and if the number of records exceeds 2000, we can consider indexes as appropriate.

Another case where indexes are not recommended is when they are less selective. Selectivity of indexes refers to the ratio of non-repeating index values (Cardinality) to the number of table records (#T) :

Index Selectivity = Cardinality / #T
Copy the code

Obviously, the selectivity range is (0, 1), and the higher the selectivity, the greater the index value, which is determined by the nature of B+Tree. For example, if the title field on employees.titles is frequently queried separately, does it need to be indexed? Let’s look at its selectivity:

Title has less than 0.0001 selectivity (precise value 0.00001579), so there’s really no need to index it separately.

There is an index optimization strategy related to index selectivity called prefix index, which is to replace the whole column with the prefix of the column as the index key. When the prefix length is appropriate, the selectivity of the prefix index can be close to the whole column index, and the index file size and maintenance cost can be reduced because the index key is shorter. The following uses the employees.employees table as an example to describe how to select and use a prefix index.

As you can see from the sample database diagram, the Employees table has only one index, so if we wanted to search for a person by name, we would have to do a full table scan, which is obviously inefficient if we search for employees by name frequently, so we can consider indexes. Select last_name> from last_first_name (last_name>);

The value of first_name and last_name is 30, and the value of last_name is 30. Consider indexing the first characters of first_name and last_name, for example, to see the selectivity:

At this point, the selection is ideal, and the length of the index is only 18, which is nearly half shorter. Let’s create the prefix index: ALTER TABLE employees.employees ADD INDEX first_name_last_name4 (first_name, last_name (4)); At this time, execute a query by name again, compare and analyze the results before the index: performance improvement is significant, query speed increased by more than 120 times.

Prefix indexes have both index size and query speed, but their disadvantage is that they cannot be used for ORDER BY and GROUP BY operations, and they cannot be used for Covering indexes (i.e. the data file itself is not accessed when the index itself contains all the data required for the query).

Pay attention and don’t get lost

All right, everybody, that’s all for this article. All the people here are talented. As I said before, there are many technical points in PHP, because there are too many, it is really difficult to write, you will not read too much after writing, so I have compiled it into PDF and document, if necessary

Click on the code: PHP+ “platform”

As long as you can guarantee your salary to rise a step (constantly updated)

I hope the above content can help you. Many PHPer will encounter some problems and bottlenecks when they are advanced, and they have no sense of direction when writing too many business codes. I have sorted out some information, including but not limited to: Distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, Laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripting, Docker, microservices, Nginx, etc