It takes about 20 minutes to read this article!

Hello everyone, we introduced the knowledge point about actual combat. It is mainly used in the case of more table data, how to cleverly take out a few pieces of data. Here are some of the worst-case scenarios for index failures.

For the record, I learned from Mr. Dinky, and I made my own technical summary

When it comes to index failure, today’s analysis is mainly from three directions. Function operations, implicit type conversion, implicit character encoding conversion. There is no like,! = etc.

Function operation

Suppose you now maintain a trading system in which the trading record table Tradelog contains the trade sequence number (TradeID), trader ID (operator), and trade time (T_modified) fields. We’ll ignore the other fields for the sake of description. The construction of this table is as follows:

mysql> CREATE TABLE `tradelog` (
  `id` int(11) NOT NULL.`tradeid` varchar(32) DEFAULT NULL.`operator` int(11) DEFAULT NULL.`t_modified` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `tradeid` (`tradeid`),
  KEY `t_modified` (`t_modified`))ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Copy the code

Assuming that all data has been recorded from the beginning of 2016 to the end of 2018, operations has a requirement to count the total number of transactions recorded in July of all years. This logic doesn’t seem too complicated; your SQL statement might say something like this:

mysql> select count(*) from tradelog where month(t_modified)=7;
Copy the code

Since the T_MODIFIED field has an index, you can safely execute the statement in the production repository, only to find that it takes too long to return the result.

If you ask a DBA colleague why this is the case, he will probably tell you that if you perform a function on a field, you do not need an index, which is a MySQL rule.

Now that you’ve learned about InnoDB’s index structure, you can ask why? Where month(t_modified)=7 where month(t_modified)=7

Here is a schematic of the T_modified index. The number above the box is the value of the month() function.

If your SQL condition is where T_modified =’2018-7-1 ‘, the engine will quickly locate the result required by t_MODIFIED =’2018-7-1 ‘by following the green arrow above.

In fact, the fast positioning capability provided by B+ trees comes from the orderness of sibling nodes in the same layer.

However, if you evaluate the month() function, you’ll see that when 7 is passed in, you don’t know what to do at the first level of the tree.

That is, performing functional operations on index fields can break the order of index values, so the optimizer decides to abandon tree search.

Note that the optimizer is not abandoning the use of this index.

In this example, instead of tree searching, the optimizer has the option of traversing either the primary key index or the index T_MODIFIED. After comparing the index size, the optimizer finds that t_MODIFIED is smaller and traversing this index is faster than traversing the primary key index. Therefore, the index T_modified will be selected.

Next, let’s use the Explain command to view the execution result of this SQL statement.

Key =”t_modified” means that the index t_modified is used; I inserted 100,000 rows in the test table data, rows=100335, indicating that this statement scanned all values of the entire index; The Using index of Extra indicates that an overwrite index is used.

That is, the month() operation in the T_modified field results in a full index scan. To take advantage of the quick locating power of indexes, we need to change our SQL statements to range queries based on the fields themselves. By writing this, the optimizer will be able to take advantage of the fast positioning capabilities of the T_Modified index as expected.

mysql> select count(*) from tradelog where
    -> (t_modified >= '2016-7-1' and t_modified<'2016-8-1') or
    -> (t_modified >= '2017-7-1' and t_modified<'2017-8-1') or 
    -> (t_modified >= '2018-7-1' and t_modified<'2018-8-1');
Copy the code

Of course, if your system goes live earlier, or you insert data from later years, you’ll need to make up for other years.

MySQL can no longer use the month() function, but can only use full index scans.

The optimizer does, however, “slack off” on this issue by not considering indexes even for functions that do not change order. For example, for the SQL statement select * from Tradelog where ID + 1 = 10000, this increment does not change the order, but the MySQL optimizer still cannot quickly locate the row 9999 using the ID index. Where id = 10000-1; where id = 10000-1;

Implicit type conversion

Let’s take a look at this SQL statement:

mysql> select * from tradelog where tradeid=110717;
Copy the code

The trade number tradeID field already has an index, but the explain result shows that this statement requires a full table scan. As you may have noticed, the tradeid field type is varchar(32), but the input parameter is an integer, so a type conversion is required.

So, here are two questions:

  1. What are the rules for converting data types?
  2. Why do we need a full index scan when we have a cast?

So let’s start with the first question, and you might say, well, there are so many types in the database, there are so many conversion rules, I can’t remember, what should I do?

Here is a simple way to look at the result of select “10” > 9:

  1. If the rule is “convert string to number”, then it is a number comparison and the result should be 1;
  2. If the rule is “convert a number to a string”, then you are doing a string comparison and the result should be 0.

Select “10” > 9 returns 1, so you can confirm the MySQL conversion rule: in MySQL, a string is converted to a number when compared with a number.

Select * from full table scan;

mysql> select * from tradelog where tradeid=110717;
Copy the code

To the optimizer, this statement is equivalent to:

mysql> select * from tradelog where  CAST(tradid AS signed int) = 110717;
Copy the code

In other words, this statement triggers the rule we mentioned above: the optimizer disallows tree-searching by performing a function on an index field.

Now, I leave you with a quick question. Id is of type int. Does this result in a full table scan?

select * from tradelog where id="83126";
Copy the code

You can analyze oneself first, verify confirm to database inside again.

Now, let’s look at a slightly more complicated example.

Implicit character encoding conversion

Suppose the system has another table, trade_detail, that records the action details of the trade. To facilitate quantitative analysis and reproduction, I insert some data into the tradelog table and trade_detail table.

mysql> CREATE TABLE `trade_detail` (
  `id` int(11) NOT NULL.`tradeid` varchar(32) DEFAULT NULL.`trade_step` int(11) DEFAULT NULL./* Procedure */
  `step_info` varchar(32) DEFAULT NULL./* Step information */
  PRIMARY KEY (`id`),
  KEY `tradeid` (`tradeid`))ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into tradelog values(1.'aaaaaaaa'.1000.now());
insert into tradelog values(2.'aaaaaaab'.1000.now());
insert into tradelog values(3.'aaaaaaac'.1000.now());

insert into trade_detail values(1.'aaaaaaaa'.1.'add');
insert into trade_detail values(2.'aaaaaaaa'.2.'update');
insert into trade_detail values(3.'aaaaaaaa'.3.'commit');
insert into trade_detail values(4.'aaaaaaab'.1.'add');
insert into trade_detail values(5.'aaaaaaab'.2.'update');
insert into trade_detail values(6.'aaaaaaab'.3.'update again');
insert into trade_detail values(7.'aaaaaaab'.4.'commit');
insert into trade_detail values(8.'aaaaaaac'.1.'add');
insert into trade_detail values(9.'aaaaaaac'.2.'update');
insert into trade_detail values(10.'aaaaaaac'.3.'update again');
insert into trade_detail values(11.'aaaaaaac'.4.'commit');
Copy the code

Select * from transaction where id=2; select * from transaction where id=2;

mysql> select d.* from tradelog l, trade_detail d where d.tradeid=l.tradeid and l.id=2; Q1 * / / * statement
Copy the code

The results are as follows:

  1. The first row shows that the optimizer will first find the row with ID =2 in the transaction record table tradelog. This step uses the primary key index, and rows=1 means that only one row is scanned.
  2. The second row key=NULL indicates that the tradeID index on trade_detail is not used for the full table scan.

In this execution plan, the tradeID field is taken from the Tradelog table and the matching field is queried in the trade_detail table. Therefore, we call tradelog the driven table, trade_detail the driven table, and tradeID the associated field.

Next, let’s look at the execution flow represented by the Explain result:

  1. The first step is to find the row L2 in the tradelog table according to id;
  2. Step 2, fetch the value of the TradeID field from L2;
  3. The third step is to look for matching rows in the trade_detail table based on the tradeID value. The second key=NULL line in the EXPLAIN result indicates that the tradeid value matches one by one by iterating through the primary key index.

At this point, you will find that step 3 does not meet our expectations. Because the tradeID field in the trade_detail table is indexed, we expected to be able to quickly locate rows of equivalent value by using the tradeID index. But it’s not here.

If you ask a DBA, they will probably tell you that because the character set of the two tables is utF8 and UTF8MB4, the index of the associated field is not used in the join query. This is the answer that you usually get when you search this question.

But you should also ask why indexes are not used in different character sets.

SQL > execute step 3; SQL > execute step 3;

mysql> select * from trade_detail where tradeid=$L2.tradeid.value; 
Copy the code

$l2.tradeid. value is utF8MB4.

Utf8mb4 is a superset of UTF8, so when comparing these two types of strings, MySQL converts the utF8MB4 character set to UTF8MB4 character set and then compares them.

This setup makes sense, utF8MB4 is a superset of UTF8. Similarly, in programming languages, automatic type conversions are performed “in the direction of increasing length” in order to avoid errors due to truncation.

Therefore, to execute the above statement, the fields in the driven table need to be converted to UTF8MB4 one by one and compared with L2.

In other words, this statement is essentially equivalent to the following statement:

select * from trade_detail  where CONVERT(traideid USING utf8mb4)=$L2.tradeid.value; 
Copy the code

The CONVERT() function, in this case, converts the input string to the UTF8MB4 character set. This again triggers the principle we mentioned above: the optimizer will disable tree search by performing functions on index fields.

The result is that a table scan is performed on the index field of the table that is being driven.

As a comparison check, I give you another requirement, “Find the operator of the trade_detail operation with id=4”, and look at this statement and its execution plan.

mysql>select l.operator from tradelog l , trade_detail d where d.tradeid=l.tradeid and d.id=4;
Copy the code

The trade_detail table is the driver table, but the second line of the explain result shows that the query uses the tradelog index (tradeID) with the number of rows scanned being 1. This is also the join operation of the two tradeID fields. Why use the tradeID index of the driven table this time? So let’s analyze it.

Assuming that the row with id=4 in the trade_detail driver table is R4, then at join time (step 3 in Figure 5), the driven table tradelog will execute an SQL statement like this:

select operator from tradelog  where traideid =$R4.tradeid.value; 
Copy the code

$r4.tradeid. value is utF8 and is converted to UTF8MB4 according to the character set conversion rules.

select operator from tradelog  where traideid =CONVERT($R4.tradeid.value USING utf8mb4); 
Copy the code

As you can see, the CONVERT function is appended to the input parameter so that the traideID index of the driven table can be used. Once the principle is understood, it can be used to guide operations. There are two ways to optimize statements

select d.* from tradelog l, trade_detail d where d.tradeid=l.tradeid and l.id=2;
Copy the code
  1. A common optimization is to change the character set of the tradeID field on the trade_detail table to UTF8MB4 as well, so there are no character set conversion issues.
alter table trade_detail modify tradeid varchar(32) CHARACTER SET utf8mb4 default null;
Copy the code
  1. It would be nice to be able to change the character set of the field. However, if the amount of data is large, or the business can not do the DDL for the time being, you can only use the method of modifying the SQL statement.
mysql> select d.* from tradelog l , trade_detail d where d.tradeid=CONVERT(l.tradeid USING utf8) and l.id=2; 
Copy the code

Here, I actively convert l.trade ID to UTF8 to avoid character encoding conversion on the driven table. From explain results, we can see that this time the index is correct.

conclusion

There are three types of index failures in MySQL. Through the introduction of the case, and then to SQL analysis and finally to the conclusion summary. If you are not clear, you can send me a private message