Preface: I did query optimization for a while during my internship. During that time, I read “High Performance MySQL” and got some experience. Now I summarize it again. Focus on high Performance mysql and add your own insights.

(a) slow query reasons 1, whether to the database request unnecessary data (1) query unnecessary data in simple terms, is to obtain all the results and then filter, and then discard most of the data, in fact, these data can not be screened out, Mysql > alter table association return all columns; mysql > alter table association return all columns; mysql > alter table association return all columns;

SELECT * FROM tab1 
INNER JOIN tab2
INNER JOIN tab3
WHERE a = bCopy the code

This will return all columns of the three tables, and the correct way is to fetch only the required columns. Every time we see SELECT *, we need to ask if this is necessary, sometimes for extensibility, and acceptable if we understand the cost.

We see that the returned data is what we need, but we should continue to see if the returned result scans too much data. There are three indicators: response time, number of rows scanned, and number of rows returned. We can use Explain to see. I’m not going to expand it here.

(three) the way to reconstruct the query 1, whether a complex query can be divided into multiple simple query we write complex query is to let the database as far as possible to complete as many tasks as possible, that network communication, query parsing and optimization is very expensive things. But that is no longer the case. But such segmentation is to be reasonable, not the original good query to be forced apart. Now, a little bit about the splitting technique

2, the shard query next uses an example to illustrate the application scenario. If we need to delete a large amount of old data periodically every month, if we use a single statement to complete, then a large amount of data will be locked, system resources will be exhausted, and other transactions may be blocked, or even deadlock. So, we can split it. The original statement:

DELETE FROM tab WHERE create_time < DATE_SUB(NOW(),INTERVAL 3 MONTH);Copy the code

Broken down into:

row = 0;
do {
    row = do_query("DELETE FROM tab WHERE create_time < DATE_SUB(NOW(),INTERVAL 3 MONTH) LIMIT 10000")
}WHILE row > 0Copy the code

I think it’s OK for the server to delete ten thousand at a time, and we can pause for a while after each deletion.

3, decomposition associated query (here is all reference to this book, before to split, but the boss does not let, so no practical experience) first look at an example

SELECT * FROM tag
    JOIN tag_post ON tag_post.tag_id=tag.id
    JOIN post ON tag_post.id=post.id
WHERE tag.tag='mysql';Copy the code

Can be split into

SELECT * FROM tag WHERE tag='mysql';
SELECT * FROM tag_post WHERE tag_id=1234;
SELECT * from post WHERE post.id in (124.456.567.9098.8904);Copy the code

So let’s look at the benefits. (1) Caching is more efficient. Many of our applications can easily cache the result object of a single query, for example if the tag is cached, then the first query can be skipped. For mysql query cache, if a table of associated changes, so will not be able to use the cache, and after break up, if there is a table that table rarely, then based on the table in the query can reuse the query cache results (2) to execute a single query can reduce competition lock (3) the application layer to do, It is easier to split the database, achieve higher performance and scalability (4) reduce the query of redundant records. Associated query on the application layer means that a record needs to be queried only once. Associated query on the database means that some data may be queried repeatedly

Query COUNT() can be used to COUNT the number of columns, or COUNT the number of rows (column values are not empty). We know that in MyISAM, COUNT(*) is very fast without any WHERE conditions, because we don’t need to COUNT the number of rows in the table, and if we add where flowers, it’s just like any other storage engine. (1) Simple optimizations we can use COUNT(*) to speed up a particular query, such as the query ID greater than 5 cities, we can write:

SELECT COUNT(*) FROM world.City WHERE ID>5Copy the code

However, we might have to scan a lot of data, so we can do it the other way:

SELECT (SELECT COUNT(*) FROM world.City)-COUNT(*) FROM world.City WHERE ID <= 5Copy the code

(2) Complex optimization Generally speaking, COUNT needs to scan a large number of rows to obtain accurate results, so it is difficult to optimize, so we can use a summary table, or add a cache system like Redis, but we will find that fast, accurate, simple implementation, can only satisfy two of them.

(1) Ensure that there is an index ON the ON or USING column. In general, unless there is another reason, you only need to create an index ON the corresponding column in the second table in the association order. (2) Ensure that expressions in any GROUP BY and ORDER BY are designed to be columns in only one table. This makes it possible for mysql to optimize the process using indexes

(6) Optimization of LIMIT paging when we need to page operation, we usually use LIMIT plus offset, and the appropriate ORDER BY clause, if there is a corresponding index, then the effect will be good, otherwise, you need to do a lot of file sorting operations. Next, I will talk about a situation that many people will encounter, which I also found in the optimization of the internship, that is, in the case of a large offset, such as: LIMIT 1000020,20; if the number of pages is 10020,20 is returned, and the first 10000 is discarded, then either LIMIT the number of pages or optimize the performance of large offsets. (1) Use indexes to cover the scan as much as possible, instead of querying all columns, and then doing an association operation as needed and returning the required columns.

SELECT film_id,description FROM sakila.film ORDER BY title LIMIT 50.5Copy the code

If the table is big, it ends up looking like this

 SELECT film.film_id,film.description
 FROM sakila.film
     INNER JOIN(
        SELECT film_id FROM sakila.film
        ORDER BY title LIMIT 50.5
    ) AS lim USING(film_id);Copy the code

Deferred association here greatly improves query efficiency by allowing Mysql to scan as few pages as possible to get the records that need to be accessed before returning to the original table based on the associated columns. This technique can also optimize the LIMIT clause in a relational query.

(2) Sometimes the LIMIT query can be converted to a known location query, so that Mysql can obtain the corresponding result through the range scan. For example, if there is an index on a positional column, and the boundary values are precomputed, the query above could be written as:

SELECT film_id,description FROM sakila.film WHERE position BETWEEN 50 AND 54 ORDER BY positionCopy the code

(3) You can use the bookmark to save the position of the last record data, so the next scan can start from the bookmark position, so that the use of OFFSET can also be avoided. But there are requirements, such as requiring primary keys to be monotonically increasing. Let’s look at an example

SELECT * FROM sakila.rental ORDER BY rental_id DESC LIMIT 20Copy the code

For example, if the primary key is 10000 to 10019, the next page query can start at 10019

SELECT * FROM sakila.rental 
WHERE rental_id < 10019
ORDER BY rental_id DESC LIMIT 20Copy the code

Mysql always executes UNION queries by creating and populating temporary tables. Therefore, many optimization strategies cannot be well used in UNION, and need to be pushed down to UNION subqueries. For example, these clauses are directly redundant to write a copy to each subquery. So unless you want to eliminate duplicate lines, use UNION ALL. When using UNION, Mysql adds a DISTINCT option to a temporary table, which makes a unique check on the entire temporary table costly in performance.

Conclusion: In fact, there are a lot of skills, these are commonly used by me before, so I share them. Other skills are not familiar to me for the time being, so I will not write them first.