background

Recently, I am working on an asset migration and merger. The old assets are in system A, and they need to be transferred to system B. So the simple way is that system B starts A task, from system A paging query, get the results and then insert into system B. Of course, the MySQL behind both systems are separate libraries and tables.

The problem

Of course, we are not talking about data loss or transaction consistency here. So what went wrong? In the actual operation, it is found that the query of some users’ assets will time out, resulting in the query of this part of the data. Board the Fortress and see the stack log:

This is obviously caused by the query timeout.

Analysis of the

SQL query timeout problem, according to my experience, there are generally three ideas:

  1. The amount of database table data is too large, resulting in SQL runtime timeout.
  2. The timeout period configured in the data source configuration file is too short. Procedure
  3. SQL not executed to index;

To analyze:

  • Check the library table, the magnitude of the table is also millions, SQL itself is relatively simple, so it can be excluded;
  • The timeout time of the data source configuration file is 5s. Since other business queries are normal, they can also be excluded.

So, obviously, it’s the index. After opening Explain, another index was found! But the SQL is written according to our expected index! So why did the index fail? The traditional way to check, that is to see the SQL itself, whether it is consistent with the most left matching principle, the other is to check the SQL index field is not using the function operation; Of course, since I say it here, it must be ruled out. So what’s the reason? That’s what we’re going to talk about this time.

The experiment

Notice that IN the above description, I have not posted the relevant SQL and library table structure, this is to protect the company’s related business and code. But this does not prevent me from continuing to prepare the following experimental data for you (MySQL is based on version 5.7). Here are my steps:

  1. Start by creating a table
CREATE TABLE `student` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) DEFAULT ' ',
  `sex` int(11) DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `stuNo` varchar(100) DEFAULT NULL.PRIMARY KEY (`id`),
  KEY `idx_name` (`name`),
  KEY `idx_stuNo` (`stuNo`)
) ENGINE=InnoDB AUTO_INCREMENT=20001 DEFAULT CHARSET=utf8;
Copy the code

In this case, the student table is created with two indexes idx_name and idx_stuNo

  1. Insert data into this table, so I’ve inserted 20,000 data here. (I’m not going to show you how to insert data, it’s a little silly.)
  2. Prepare an SQL
select stuNo from student  where stuNo > '100' and stuNo < '30000000'
Copy the code

So let’s take a look at his explain implementation plan

explain select stuNo from student  where stuNo > '100' and stuNo < '30000000';

Copy the code

The results are as follows:

As you can see, the SQL is executed to the idx_stuNo index, and there seems to be no major problem. Next, let’s modify the SQL

select * from student  where stuNo > '100' and stuNo < '30000000'
Copy the code

Let’s take a look at his explain execution plan

explain select * from student  where stuNo > '100' and stuNo < '30000000';
Copy the code

Oh my god, here is a full table query, no go index!

Cost model

The function of indexes is to help us improve query efficiency. How does MySQL select the correct search path when there are multiple indexes in a table? The concept of a cost model will be introduced here, and we will only explore the SQL query case.

SQL query cost = I/O cost + CPU cost

A full table scan

Let’s start with the simplest case, a full table scan. The reason why a full table scan is easiest is because a full table scan is based on the source table, so the IO cost is how many pages are fetched, and the CPU cost is how many rows are read. So available

IO cost = number of pages queried * 1.0 + 1.1 CPU cost = number of lines * 0.2 + 1.0

Here 0.2 and 1.0 are calculated constants (which must be empirical), and here +1.0 and +1.1 are fine tuning values. SQL > select * from table where full table scan is performed;

  1. The first choice is to get the number of pages of the query, which can be obtained based on the query table itself

show table status like ‘student’

You can see that Data_length is 1589248, so 1589248/16/1024 = 97 (because the default page size is 16K), you can get a total of 97 pages, which requires 97 IO. The total number of rows is 19,980. So you get

IO cost = 97 x 1.0 + 1.1 = 98.1 CPU cost = 19980 x 0.2 + 1 = 3997 Total cost = 98.1 + 3997 = 4095.1

An index scan

Index scan table query, its cost calculation, generally according to the IO cost +CPU cost, but can be divided into two stages: index cost + data cost. Index cost refers to the cost of retrieving the index, whereas data cost is only involved when the table needs to be returned. Let’s start with index costs:

IO cost = scan interval of index * 1.0 CPU cost = number of index records * 0.2 + 0.01

Since it is a range lookup, MySQL will assume that the I/O cost of reading a scan interval of the index is the same as the I/O cost of reading a page, so the IO cost is 1 * 1.0 = 1; So how do we calculate the number of index records here?

  1. Access the B+ tree index corresponding to idx_stuNo based on the condition that stuNo > 100 and find the first record that meets the condition that stuNo > 100 (we call this the left-most record of the interval). Locating a record in a B+ number tree is fast, constant, and the performance cost of the process is negligible.
  2. The performance cost of this process is also negligible by continuing to find the last record in the B+ tree index corresponding to idx_stuNo (which we call the right-most record of the interval) that meets this condition based on stuNo < 30000000.
  3. If the left-most and right-most interval records are not too far apart (in MySQL 5.7.22, as long as they are no more than 10 pages apart), you can accurately count the number of secondary index records that meet the 100 < stuNo < 30000000 condition. Otherwise, you need to read 10 pages to the right along the left-most record of the range, calculate how many records each page contains on average, and then multiply that average by the number of pages between the left-most and right-most records of the range.

The number of index records is 5119. So the index cost is zero

IO cost = 1 x 1.0 = 1 CPU cost = 5119 x 0.2 + 0.01 = 1023.81 Total cost = 1 + 1023.81 = 1024.81

There’s a phrase we often hear when we’re using indexes — back to table, yes. The above cost is the cost of index only, when we get data from the index cannot meet the requirements of SQL query, we will carry out table back operation, and the table back operation is somewhat similar to the above full table scan

I/O cost = Number of data items corresponding to an index x 1.0 CPU cost = Number of data items corresponding to an index x 0.2

MySQL’s estimate of the I/O cost of a table-back operation is still very rough. MySQL considers each table-back operation to be equivalent to a page visit. Therefore, the cost can be calculated as:

I/O cost = 5119 x 1.0 = 5119 CPU cost = 5119 x 0.2 = 1023.8 Total cost = 5119 + 1023.8 = 6142.8

Finally, the total cost of the two parts combined is: 1024.81 + 6142.8 = 7167.61.

summary

As you can see, in this case, the cost of a full table scan is lower. Of course, we can also see from this, the cost of back table operation is relatively high. You can also open Optimizer_Trace to see the most detailed results.

mysql> SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE \G;
*************************** 1. row ***************************
                            QUERY: select * from student  where stuNo > '100' and stuNo < '30000000'
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `student`.`id` AS `id`,`student`.`name` AS `name`,`student`.`sex` AS `sex`,`student`.`age` AS `age`,`student`.`stuNo` AS `stuNo` from `student` where ((`student`.`stuNo` > '100') and (`student`.`stuNo` < '30000000'))"
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "((`student`.`stuNo` > '100') and (`student`.`stuNo` < '30000000'))",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "((`student`.`stuNo` > '100') and (`student`.`stuNo` < '30000000'))"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "((`student`.`stuNo` > '100') and (`student`.`stuNo` < '30000000'))"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "((`student`.`stuNo` > '100') and (`student`.`stuNo` < '30000000'))"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`student`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`student`",
                "range_analysis": {
                  "table_scan": {
                    "rows": 19980,
                    "cost": 4095.1
                  },
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_name",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_stuNo",
                      "usable": true,
                      "key_parts": [
                        "stuNo",
                        "id"
                      ]
                    }
                  ],
                  "setup_range_conditions": [
                  ],
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "idx_stuNo",
                        "ranges": [
                          "100 < stuNo < 30000000"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 5119,
                        "cost": 6143.8.-- Here is the index cost
                        "chosen": false,
                        "cause": "cost"
                      }
                    ],
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    }
                  }
                }
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`student`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 19980,
                      "access_type": "scan",
                      "resulting_rows": 19980,
                      "cost": 4093,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 19980,
                "cost_for_plan": 4093,
                "chosen": true
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "((`student`.`stuNo` > '100') and (`student`.`stuNo` < '30000000'))",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`student`",
                  "attached": "((`student`.`stuNo` > '100') and (`student`.`stuNo` < '30000000'))"
                }
              ]
            }
          },
          {
            "refine_plan": [
              {
                "table": "`student`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.00 sec)
Copy the code

The last

Although the MySQL optimizer is a great help, in some cases the calculation model does not necessarily cover our actual results, so there is also a way to check when such things happen. In some cases, you can also use force index to implement ~