One, classification discussion

If an SQL statement executes slowly, does it execute slowly every time? Or is it mostly normal and occasionally slow? So I think there are two other things we can talk about.

  1. Most of the time it’s normal, but occasionally it’s slow.
  2. This SQL statement has been executing slowly for the same amount of data.

In view of these two cases, let’s analyze the possible causes.

Two, for the occasional slow situation

In this case, I think there is nothing wrong with the writing of the SQL statement itself, but something else is causing it. What is the cause?

1. The database is flushing dirty pages

When we want to insert into the database, or a data to update the data, we know that the database will update the corresponding field data in the memory, but after the update, the update of the field will not be immediately synchronized persisted to disk, but to write the record of these updates to the redo log diary, when the free, Synchronize the latest data to disk using the redo log.

When the contents of a memory data page are inconsistent with those of a disk data page, the memory page is called a “dirty page.” After memory data is written to disk, the contents of the data pages on memory and disk are consistent and are called “clean pages”.

There are four scenarios for brushing dirty pages (the latter two are less concerned with “performance” issues) :

Redolog is full:

The redo log has limited capacity. If the database is busy or updated frequently, the redo log will quickly become full, and the redo log will not be able to synchronize data to disk until it is idle. This can cause our normal SQL statements to suddenly execute slowly, so when the database synchronizes data to disk, it may cause our SQL statements to execute slowly.

Out of memory:

If a query of more data, just encounter the data page is not in memory, need to apply for memory, and at this time just memory is insufficient when you need to eliminate part of the memory data pages, if it is clean pages, directly release, if it happens to be dirty pages need to brush dirty pages.

When MySQL deems the system “idle” :

There’s no pressure on the system.

MySQL > open MySQL > open MySQL > open MySQL

MySQL flushes all the dirty pages to disk, so the next time MySQL starts, it can read data directly from disk.

What can I do if I can’t get the lock

This is easy to think of, we want to execute this statement, it happens to be involved in the table, someone else is using, and is locked, we can not get the lock, only to wait for someone else to release the lock. Or, the table is not locked, but the use of a row is locked, this time, I can not help ah.

To see if we are really waiting for a lock, we can use the show processList command to check the current state. I should note that some commands are best documented.

Now let’s come and look at the second case, which I think is the most important one

Three, for always so slow situation

If this SQL statement executes slowly every time with the same amount of data, it’s time to think about your SQL writing. Here’s a look at some of the reasons why your SQL statement executes poorly.

Let’s assume we have a table with two fields, primary key ID, and two normal fields, C and D.

mysql> CREATE TABLE `t` (
  `id` int(11) NOT NULL,
  `c` int(11) DEFAULT NULL,
  `d` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;
Copy the code
1, pierced the heart, not to the index

The reasons for not using an index, which I think a lot of people can think of, for example, if you want to query this statement

select * from t where 100 <c and c < 100000;

A. The field has no index

If you do not have an index on your C column, you will have to do a full table scan.

B. The field has an index, but no index is used

Ok, this time you add the index to the c field, and then query another statement

select * from t where c – 1 = 1000; I want to ask you a question, this will be used in the query index query?

The answer is no, if we do the operation on the left side of the field, so sorry, in the query, the index will not be used, so you should pay attention to this type of field index, but due to their own neglect, resulting in the system does not use the index situation.

The correct query should be as follows

select * from t where c = 1000 + 1;

One might say, well, if you have operations on the right you can use indexes, right? Wouldn’t the database automatically optimize for us and automatically convert c-1 =1000 to C =1000 +1?

I’m sorry, but I’m not gonna help you, so, uh, you need to be careful.

C. The index is not used due to function operations

If we perform a function operation on a field during a query, the index will not be used, for example

select * from t where pow(c,2) = 1000; I’m just doing an example here, assuming that the function pow is taking c to the NTH power, and there’s probably no function pow of c comma two. And this is very similar to doing the left-hand side up here.

Therefore, when a statement execution is slow, it may be that the statement does not use the index, but you will analyze the specific reasons why the index is not used. The three reasons LISTED above should be more frequent.

2, ha ha, the database chose the wrong index

When we do a query operation, for example

select * from t where 100 < c and c < 100000; As we know, there is a difference between a primary key index and a non-primary key index. The primary key index stores the entire row of data, and the non-primary key index stores the value of the primary key field. There is a difference between a primary key index and a non-primary key index. There is a difference between a primary key index and a non-primary key index

In other words, if we go through the index of c, we will finally query the value of the corresponding primary key. Then, according to the value of the primary key, we will go through the primary key index and query the data of the whole row.

If c < 100 and c < 100,000, the entire table will be scanned. If c < 100 and c < 100,000, the entire table will be scanned. If c < 100 and c < 100,000, the entire table will be scanned.

Why is that?

When executing this statement, the system will make a prediction: will there be fewer rows in the c index scan, or will there be fewer rows in the full table scan? Obviously, fewer lines scanned is better, because fewer lines scanned means fewer I/O operations.

If the whole table is scanned, then the number of scans is the total number of rows in the table, let’s say n; If we go through index C, after we find the primary key through index C, we have to go through the primary key index again to find our entire row, which means we need to go through the index twice. In addition, we don’t know how many rows fit the condition of 100 c < and c < 10000. What if all the rows fit the condition? This means that not only does index C scan n rows, but each row also has to be indexed twice.

Therefore, it is possible for the system to perform a full table scan without performing an index scan. So how does the system tell?

The judgment is based on the system’s prediction, that is, the system will predict how many rows will be scanned if the c field index is to be scanned. If it predicts a large number of rows to be scanned, it may skip the index and scan the full table.

So the question is, how does the system predict? And I’m going to tell you how the system does that, even though MY neck is a little sore at this point.

The system determines the index by its distinctness. The more different values on an index, the fewer indexes with the same value, and the higher the index’s distinctness. We also call the degree of distinction the cardinality, that is, the higher the degree of distinction, the greater the cardinality. So, the larger the cardinality, the fewer rows that meet the conditions 100 < c and c < 10000.

So, the larger the cardinality of an index, the more advantage it has to go through the index.

So the question is, how do you know the cardinality of this index?

The system, of course, does not iterate through all of the data to get the cardinality of an index, because that’s too expensive, but the index system does this by going through part of the data, by sampling, to predict the cardinality of the index.

In other words, the cardinality of index C is actually very large, but when sampling, unfortunately, the cardinality of index C is predicted to be very small. For example, the part of the data you sample happens to have a small cardinality, and then you mistakenly think the index has a small cardinality. Then hehe, the system does not walk c index, directly walk all scan.

So, having said all this, we conclude that a statistical error caused the system to perform a full table scan instead of an index scan, and this is the reason why our SQL statement execution was slow.

Here I declare that the prediction of the number of rows scanned is only one of the reasons for the system to determine whether the query should be indexed or not, and whether the query should use temporary tables, sorting, etc.

However, we can sometimes force the query to go through the index, for example

select * from t force index(a) where c < 100 and c < 100000; We can also pass

show index from t; To query the cardinality of the index and the actual match, if the actual match, we can re-count the cardinality of the index, you can use this command

analyze table t; To re-analyze the statistics.

Since the cardinality of the wrong index is predicted, this also means that if the query has multiple indexes, the system may also choose the wrong index, which may be a reason for the slow execution of SQL.

All right, that’s all for now, and I think it’s pretty good that you can pull out that much, so let’s conclude.

Four,

The above is my summary and understanding, the last part, I am afraid that many people do not understand the database will actually choose the wrong index, so I explain in detail, the following I do a summary of the above.

A SQL execution is slow. We will discuss two cases:

1, most of the time normal, occasionally slow, for the following reasons
  • The database is flushing dirty pages. For example, the redo log is full and needs to be synchronized to disk.
  • During execution, a lock is encountered, such as a table lock or row lock.
2. The SQL statement is executed slowly for the following reasons.
  • No index is used: For example, the field has no index. The index cannot be used because of operation or function operation on the field.
  • The database selected the wrong index.