I’m sure most of you have done this in the past. This will result in implicit type conversions. Especially when most of the data stored in a field is numeric, it is easy to assume that the field is of type int and mistakenly execute a query using something like where file_id=123456789. It is better to look at the statement’s execution plan through the EXPLAIN command and see that the hit index was not used, thus correcting the error. Production accidents resulted from slow queries and spikes in CPU usage and disk IO on the database server after a poor code release went live.

If you’re looking for a varchar, you can’t hit the index with a number, but if you’re looking for a string, you can usually find the index very quickly. Why is that?

Let’s take a real test to explain why this happens. The test MySQL version is 5.7.18, the structure of the data table FILE is as follows, the storage engine is InnoDB, and the number of data bars in the table is 5 million +.

mysql> SELECT VERSION(); + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | VERSION () | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | 5.7.18-20170830 - log | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + mysql > DESC `file`; +----------+---------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra  | +----------+---------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | fs_id | varchar(20) | NO | MUL | NULL | | | filename | varchar(255) | NO | | NULL | | | shareid | bigint(20) unsigned | NO | MUL | NULL | | | uk | bigint(20) unsigned | NO | | NULL | | | pid | varchar(32) | NO | | NULL  | | +----------+---------------------+------+-----+---------+----------------+ mysql> SELECT COUNT(*) FROM `file`; +----------+ | COUNT(*) | +----------+ | 5416697 | +----------+

The VARCHAR field is queried with numbers

The fs_id field in the data table FILE is of type VARCHAR, and the ordinary index IDX_FS_ID is established. When querying with strings, it takes 0.07 seconds. Looking at the execution plan through the EXPLAIN command, the results show that the index of the fs_id field was used during the query.

mysql> SELECT * FROM `file` WHERE `fs_id`='635341798980956'; +---------+-----------------+-------------+------------+------------+---------+ | id | fs_id | filename | shareid | uk |  pid | +---------+-----------------+-------------+------------+------------+---------+ | 1043170 | 635341798980956 | Senior ⑮ MySQL | 3181065465 | 3959617630 | o6RlSp0 | + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- + 1 row in the set (0.07 SEC) mysql > EXPLAIN SELECT * FROM `file` WHERE `fs_id`='635341798980956'; +----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+  | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+ | | 1 SIMPLE | file | NULL | ref | idx_fs_id | idx_fs_id | 62 | const | 1 | | NULL | 100.00 +----+-------------+-------+------------+------+---------------+-----------+---------+-------+------+----------+-------+

However, when querying with numbers, it took 7.04 seconds. Looking at the execution plan through the EXPLAIN command, we found that the query was performed with a full table scan and no indexes were used.

mysql> SELECT * FROM `file` WHERE `fs_id`=635341798980956; +---------+-----------------+-------------+------------+------------+---------+ | id | fs_id | filename | shareid | uk |  pid | +---------+-----------------+-------------+------------+------------+---------+ | 1043170 | 635341798980956 | Senior ⑮ MySQL | 3181065465 | 3959617630 | o6RlSp0 | + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- + 1 row in the set (7.04 SEC) mysql > EXPLAIN SELECT * FROM `file` WHERE `fs_id`=635341798980956; +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------- --+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------- -- -- + | | 1 SIMPLE | file | NULL | | ALL idx_fs_id | NULL | NULL | NULL | 4878670 | | 10.00 Using the where | +----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------- --+

The fs_id field is clearly indexed, but a numeric query still requires a full table scan. This happens because the fs_id field is of type string and the input parameter is of type integer, so it triggers an implicit type conversion.

The int field is queried with a string lookup

The shareID field in the table FILE is of type BIGINT, and the common index IDX_SHAREID is established. When querying with numbers and strings, the time is 0.04 seconds for both. Looking at the execution plan through the EXPLAIN command shows that the index of the IDX_SHAREID field was used in the query regardless of whether the numbers were enclosed in quotes or not.

mysql> SELECT * FROM `file` WHERE `shareid`=3181065465; +---------+-----------------+-------------+------------+------------+---------+ | id | fs_id | filename | shareid | uk |  pid | +---------+-----------------+-------------+------------+------------+---------+ | 1043170 | 635341798980956 | Senior ⑮ MySQL | 3181065465 | 3959617630 | o6RlSp0 | + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- + 1 row in the set (0.04 SEC) mysql > SELECT * FROM `file` WHERE `shareid`='3181065465'; +---------+-----------------+-------------+------------+------------+---------+ | id | fs_id | filename | shareid | uk |  pid | +---------+-----------------+-------------+------------+------------+---------+ | 1043170 | 635341798980956 | Senior ⑮ MySQL | 3181065465 | 3959617630 | o6RlSp0 | + + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- + 1 row in the set (0.04 SEC) mysql > EXPLAIN SELECT * FROM `file` WHERE `shareid`=3181065465; +----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+------ -+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+------ - + | 1 | SIMPLE | file | NULL | ref | idx_shareid | idx_shareid 8 | | const | 1 | | NULL | 100.00 +----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+------ -+ mysql> EXPLAIN SELECT * FROM `file` WHERE `shareid`='3181065465'; +----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+------ -----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+------ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- + | | 1 SIMPLE | file | NULL | ref | idx_shareid | idx_shareid 8 | | const | 1 | | 100.00 Using the index condition | +----+-------------+-------+------------+------+---------------+-------------+---------+-------+------+----------+------ -----------------+

There are three questions about this result:

  1. What are the rules for implicit conversions?
  2. Why do implicit conversions trigger a full table scan when querying data?
  3. Why can I hit an index on an int field with a string?

1. What are the rules for implicit conversion?

A very simple way to verify the rules for implicit conversions is to look at the results of SELECT ’10’ > 9 and SELECT 9 > ’10’ :

If the rule is “convert a string to a number”, then it is a numeric comparison, and the result of SELECT ’10’ > 9 should be 1, and the result of SELECT 9 > ’10’ should be 0; If the rule is “convert numbers to strings” then it is doing string comparisons and the result of SELECT ’10’ > 9 should be 0 and the result of SELECT 9 > ’10’ should be 1.

mysql> SELECT '10' > 9; + -- -- -- -- -- -- -- -- -- -- + | | '10' > 9 + -- -- -- -- -- -- -- -- -- -- + | 1 | + -- -- -- -- -- -- -- -- -- -- +
mysql> SELECT 9 > '10'; + -- -- -- -- -- -- -- -- -- -- + | 9 > '10' | + -- -- -- -- -- -- -- -- -- -- + | 0 | + -- -- -- -- -- -- -- -- -- -- +

MySQL > SELECT ’10’ > 9 = 1, SELECT 9 > ’10’ = 0; MySQL > SELECT 9 > ’10’ = 0; MySQL > SELECT 9 > ’10’ = 0;

Therefore, when we query with the following statement:

SELECT * FROM `file` WHERE `fs_id`=635341798980956;

To the MySQL optimizer, this statement is equivalent to converting fs_id to an int and comparing it with the input value:

SELECT * FROM `file` WHERE CAST(`fs_id` AS signed INT)=635341798980956;

As we all know, if the index fields are functionally manipulated at query time, the query process will not be able to use the index.

2. Why is implicit conversion triggered and a full table scan required when querying data?

For InnoDB B+ tree index structure, I believe that most partners have a certain understanding.

The example has the following set of data:

1, 2, 3, 4, 6, 6, 7, 11, 13, 21, 23, 39, 42, 61, 71, 
101, 201, 220, 303, 345, 411, 601, 620, 701, 1402, 3333

When indexing as a numeric type, the B+ tree index structure is as follows:

When indexed as a string type, the data order and B+ tree index structure are as follows:

1, 101, 11, 13, 1402, 2, 201, 21, 220, 23, 3, 303, 3333, 
345, 39, 4, 411, 42, 6, 6, 601, 61, 620, 7, 701, 71

In fact, the fast positioning ability of B+ tree index comes from the ordering of sibling nodes in the same layer. Function operations on index fields may break the order of index values. When a string is compared to a number, the string is converted to a number.

Therefore, when a string type field fs_id receives a numeric type input parameter, fs_id is converted to numeric type. The index created by the string type is out of order for numeric type, so the index of the fs_id field cannot be used, but can only be looked up by a full table scan.

3. Why can you hit an index in an int field with a string?

Is it because a string query for a numeric type field does not trigger an implicit cast? And it isn’t.

From the above verification, we know that when a string is compared to a number, the string is converted to a number.

Therefore, when we query a numeric field with a string type as the input parameter:

SELECT * FROM `file` WHERE `shareid`='3181065465'

To the MySQL optimizer, this statement is equivalent to converting the input parameter ‘3181065465’ to an int and then running the query:

SELECT * FROM `file` WHERE `shareid`=CAST('3181065465' AS signed INT);

Function operations on input parameters after the equal sign do not affect the index use of the field shareID. Therefore, the implicit type conversion occurs when a numeric type field is queried with a string, but it does not affect the use of field indexes.

conclusion

Implicit type conversions are triggered when a string is compared to a number in a MySQL query. The rule of conversion is to convert a string to a number.

When the index field is of string type and the input parameter is of numeric type, the field will be converted to numeric type and then searched. In other words, the function operation on the index field is done, which destroys the order of the index, so the index cannot be used. When the index field is of numeric type and the input parameter is of string type, the input parameter will be converted to numeric type and then searched. Function operation on the input parameter after the equal sign does not affect the order of the index field, so the index can be used.