takeaway

In “Why did MySQL choose plan A over Plan B?” In, I explain two indexing – based query cost analysis schemes.

In the process of MySQL cost analysis, in addition to index based analysis, there is a full table scan based cost analysis, today, I will explain to you how MySQL does the full table scan cost analysis?

How does MySQL calculate the cost of a full table scan in terms of CPU cost and IO cost?

The cost of the IO

MySQL calculates the IO cost of full table scan using the following formula:

IO cost = Full table scan time x I/O cost of a node + 1.1

Full table scan time

So, how to calculate the full table scan time?

The calculation of the full table scan time is different between MyISAM and InnoDB. The common storage engine is InnoDB, and the case of table user is also InnoDB. So, I will talk about the calculation method of the full table scan time of InnoDB engine.

The time for a full table scan in InnoDB is the number of nodes in the cluster index.

Select * from user where user = ‘user’;

The figure above shows the cluster index structure of the user table. After counting the number of all page nodes, we find that the number is 7. Therefore, the full table scan time of the user table is 7.

I/O cost of a node

So, how do I/O costs for a single node be calculated? Calculation formula:

Read cost of nodes in memory + Read cost of nodes in disks

Node read cost in memory

The node read cost in memory is calculated as follows:

Node read cost in memory = Node size in memory * MEMORY_BLOCK_READ_COST

MEMORY_BLOCK_READ_COST is a constant with a default value of 0.25, which can be changed using the following command:

UPDATE mysql.engine_cost
SET cost_value=0.5
WHERE cost_name="memory_block_read_cost"
Copy the code

The calculation formula of node size in memory is as follows:

Node size in memory = Total node size (including memory and disks) x Node usage in memory

As mentioned above, the IO cost of a single node is calculated, so the total node size = 1, i.e., one node.

The percentage of memory node usage is calculated in the following three cases:

  • Total node size < 20% of buffer_POOL_size

As shown in the figure above, table_size indicates the total node size of the table. The total node size < 20% of buffer_pool_size means that all nodes are in memory, so,

Memory usage of nodes = 1.0

  • 20% of BUFFer_POOL_size < Total node size < BUFFer_POOL_size

As shown above, 20% of buffer_pool_size < total node size (table_size) < BUFFer_POOL_size

Percentage of nodes in memory = 1.0 - (Percentage of total nodes using Buffer_POOL_size -0.2)/(1.0-0.2)

The corresponding figure is

Memory occupancy percentage of each node = 1.0 (0.75 0.2)/(1.0 0.2) = 0.3125

  • Total node size > BUFFer_POOL_SIZE

As shown in the figure above, if the total node size is greater than buffer_pool_size, all nodes cannot be stored in the memory and must be stored in disk. Therefore, no nodes are stored in the memory.

Memory usage of nodes = 0.0

Thus, after we get the percentage of nodes in memory, we also know that the total number of nodes =1. Then, assuming that the percentage of nodes in memory satisfies the second situation above, we can get the number of nodes in memory by combining the above calculation formula of node size in memory, i.e

Node size in memory = 1 * 0.3125 = 0.3125

After knowing the size of the nodes in memory, we can calculate the read cost of the nodes in memory by using the above formula:

In-memory node read cost = 0.3125 * 0.25 = 0.078125

Node read cost of a disk

The formula for calculating the read cost of a disk node is as follows:

Read cost of a disk node = Size of a disk node x IO_BLOCK_READ_COST

IO_BLOCK_READ_COST is a constant. The default value is 1.0. You can run the following command to change the value:

UPDATE mysql.engine_cost
SET cost_value=0.5
WHERE cost_name="io_block_read_cost"
Copy the code

The formula for calculating the node size in a disk is

Node size on disk = total node size – node size in memory. From above, we already know total node size and node size in memory, so the node size on disk is

1-0.3125 = 0.6875

Therefore, combined with the calculation formula of node read cost in disk above, we can get

Disk node read cost = 0.6875 x 1.0 = 0.6875

Combining with the calculation formula of IO cost of a single node above, we can get

I/O cost of a node = 0.078125 + 0.6875 = 0.765625

So, in the end, the total cost of IO is calculated as follows:

IO cost = Full table scan time * IO cost of a single node + 1.1 = 7 * 0.765625 + 1.1 = 6.459375

Among them, 1.1 is a fixed parameter for MySQL fine tuning.

CPU cost

CPU cost = full table scan time = 7

The total cost

In the end, the cost of a full table scan is CPU cost + IO cost = 7 + 6.459375 = 13.459375

summary

Why did MySQL choose plan A over Plan B? The query cost of case SQL. SQL for the case:

SELECT * FROM user WHERE age > = 16 AND age < 25 ORDER BY age LIMIT 0.20
Copy the code

The query cost analysis results are as follows:

  1. Using the indexindex_age_birthThe cost of:6.61
  2. Cost of using full table scans:13.459375

Since index index_age_birth is less costly than full table scans, MySQL finally selects index index_age_birth to perform the SQL above.