How does Mysql make your queries faster?

The preparatory work

Dev.mysql.com/doc/index-o…

First go to the above address to download the Sakila database. This is the prepared test data

Where is the query slow?

show create table inventory
Copy the code

Start by looking at the table builder

We have the SQL statement for this table

CREATE TABLE `inventory` (
  `inventory_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `film_id` smallint(5) unsigned NOT NULL,
  `store_id` tinyint(3) unsigned NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`inventory_id`),
  KEY `idx_fk_film_id` (`film_id`),
  KEY `idx_store_id_film_id` (`store_id`,`film_id`),
  CONSTRAINT `fk_inventory_film` FOREIGN KEY (`film_id`) REFERENCES `film` (`film_id`) ON UPDATE CASCADE,
  CONSTRAINT `fk_inventory_store` FOREIGN KEY (`store_id`) REFERENCES `store` (`store_id`) ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=4582 DEFAULT CHARSET=utf8mb4
Copy the code

Let’s modify the SQL statement to remove idx_store_id_film_id and the corresponding 2 foreign keys

CREATE TABLE `inventory_1` (
  `inventory_id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `film_id` smallint(5) unsigned NOT NULL,
  `store_id` tinyint(3) unsigned NOT NULL,
  `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`inventory_id`),
  KEY `idx_fk_film_id` (`film_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4582 DEFAULT CHARSET=utf8mb4
Copy the code

So now I have a new watch

Then we copy the data from the old table into the new table

insert into inventory_1 select * from inventory
Copy the code

Query the new table

The marked index is null indicating that it is not used

Look at my old watch again

Using index is displayed at the end, indicating that the index is used

Alter table alter table alter table alter table alter table alter tableClass:

Here’s another example:

This may surprise some people, but this example queries 3 fields, why can still be an index query? Why is there no return table? Since the field in this query is itself a primary key, the result of our joint index query is the primary key. This was mentioned in the last innoDB article, so there is no need to return to the table.

Here’s another example:

There is only one more lastupate column than the previous one, and there is also an index used in the query process, but there is null in extra

SQL > alter table lastUPDATE; alter table lastUpdate; alter table lastUpdate

Why not go to a more appropriate index?

When Mysql selects an index, it will refer to the cardinality of the index. This cardinality is estimated by Mysql to reflect the number of values of the field.

In general, you take a couple of pages, you average them, you multiply them by the number of pages,

Take a look at this experiment

Make a simple table

create table sakila.city_1(city varchar(50) not null);
Copy the code

Then import the data from the old table, several times

insert into sakila.city_1 select city from sakila.city;
insert into sakila.city_1 select city from sakila.city;
insert into sakila.city_1 select city from sakila.city;
insert into sakila.city_1 select city from sakila.city;
insert into sakila.city_1 select city from sakila.city;
Copy the code

And then finally we’re going to scramble the contents of this table,

update sakila.city_1 set city=(select city from sakila.city order by rand() limit 1);
Copy the code

Then add indexes to the table

alter table sakila.city_1 add key (city(1));
alter table sakila.city_1 add key (city(2));
alter table sakila.city_1 add key (city(3));
alter table sakila.city_1 add key (city(4));
alter table sakila.city_1 add key (city(5));


alter table sakila.city_1 add key (city(6));
alter table sakila.city_1 add key (city(7));
alter table sakila.city_1 add key (city(8));
Copy the code

Then take a look at the index

The card column on the right represents the cardinality estimated by mysql. The first city is indexed by the first letter, and the corresponding value is 26

That’s pretty accurate, right? Because there are only 26 letters.

And then the third and fourth accuracy is ok, and then the index accuracy is not very good.

Why is that? As mentioned earlier, this is an estimate by mysql itself, which is definitely not accurate enough, so you may sometimes wonder why mysql doesn’t use the index you think is ideal.

To solve this problem, you can use force index to force the index.

You can also use the Analyze table to re-analyze index information

In fact, based on the index cardinality, you can judge the index performance.

Count is so slow. What can we do?

The count function counts the number of non-null data in the result set.

The execution process is as follows:

First, find the result set

Then check whether they are null one by one. If they are not, +1

Take a look at the customer table below to see the fields and indexes

Let’s take a look at the execution

This execution process is very slow, no index is used, it is a full table scan

And look at this

This execution obviously uses the index, but it still needs to check whether the last_name is null each time

If the primary key is null, the primary key is null.

Mysql does not optimize the primary key (count). The primary key cannot be null, but it still checks whether the primary key is null

 select count(customer_id) from customer;
Copy the code

Does anyone think count(1) is faster?

select count(1) from customer;
Copy the code

Count 1 is theoretically faster, but it is faster because there is no row parsing in the query process, so it is faster, but the process of iterating for NULL is still not omitted

Select count(*) mysql is optimized for this

In MyISAM, count * returns the number of rows in the table

If Innodb does not record the number of rows in the table, mysql returns the number of rows in the index

Therefore, in general, no brain count(*) is ok

What can I do if I order by too slow?

select * from film where film_id>80 order by title
Copy the code

Mysql executes the above statement as follows:

Query results based on where criteria

Put the results into sort_buffer

Sort the intermediate result set by the Order field

** Return to the table to generate the full result set ** if necessary

Or how does the previous SQL statement optimize it?

Film ID is already the main build. The query is already indexed.

Optimization Direction 1:

If the result is small, we put it directly into sort Buffer, which is an in-memory buffer

If the result is large, the result set will be stored on hard disk. So that’s obviously going to involve IO operations on hard disk and this is going to be slower.

So we can adjust sort_buffer_size to make it bigger so that it’s less likely to be put on disk. But this will increase the memory footprint. So you have to think about it all.

Optimizing memory footprint and optimizing sort query time is often not the same thing

Return table in query process:

If you have 100 columns in your table, mysql will use max_LENGTH_FOR_sort_data as the result of a query where

The size of this value determines how big the middle table is, and if the middle table only has 30 fields, then obviously you have to go back to the table after sorting

All the data.

Length_for_sort_data = max_length_FOR_sort_data = max_length_FOR_sort_data; Because the result set is one

Full table, this is incorrect because the result set is so large that the table is likely to be stored on hard disk instead of memory

So how do you optimize the above query? The key word is index coverage

Index overwriting is a thing that can skip generating intermediate result sets and output query results directly

So how do you achieve index coverage conditions?

The ORDER field must have an index (or the left side of the union index)

2/ Other related fields (query criteria, result set) must also be in the index above

Such as:

select film_id,title from film order by title
Copy the code

This statement is called index overwrite

Another example:

 select title,film_id from film where title like 'm%' order by title;
Copy the code

This is also index coverage, because the filter field, the sort field, and the output field are all indexed title

What if the RAND function is too slow?

select title,description from film order by rand() limit 1;
Copy the code

First, explain this statement

Select title and DES fields from film table and sort these data randomly, and take the first data among them

Take a look at the execution of the above statement:

Create a temporary table with fields RAND Title and DES
Take a row from the table, call the rand() function, and put the result into a temporary table

3. For temporary tables, place the RAND field and row position or primary key of the temporary table into the sort_buffer (this is the second temporary table).

Sort sort sort buffer, retrieve the first row position, and then go back to the temporary table

This execution process looks slow because it involves two temporary tables, all of which are full length

Second, it only needs a random result, but undergoes unnecessary sorting (although mysql optimizes automatically)

Finally, the rand() function is called many times

How to optimize the above query

Total (count *)
Select a random number r in the total range
Execute the following SQL

select title,description from film limit r,1

Using the above scheme can greatly improve query speed.

Order by rand() limit 1. Such random queries are inefficient

An index pushdown

create table `inventory_3`(
    `inventory_id` mediumint unsigned not null auto_increment,
    `film_id` smallint unsigned not null ,
    `store_id` tinyint unsigned not null ,
    `last_update` timestamp not null default current_timestamp on update current_timestamp ,
    primary key (`inventory_id`),
    key `idx_store_id_film_id` (`store_id`,`film_id`)
) ENGINE =InnoDB auto_increment=101 default charset =utf8;

insert into inventory_3 select * from inventory
Copy the code

Then look at the following query statement

Select * from 'inventory_3' where store_id in(1,2) and film_id=3Copy the code

And some people think, well, there’s a federated index here, so this must be a quick query that ends up being indexed,

But it’s not.

Why?

Look at the structure of the union index

Store_id is the leftmost, film_id is to the right of sid, and the query result for both indexes is the primary key

When you use this federated index

If the store ID is equal to the film ID, the two columns are sorted

But if you’re in

Here’s how he did it:

Film id = 1; film id = 2; film id = 1;

His query process is as follows:

Using index condition means index is pushed down

Let’s look at normal index queries:

In fact, before mysql 5.6, the return to the table was very frequent, and after that it was actually optimized, so the difference between the above two rows is not that big

Take a look at the 5.5 mysql rows gap

The other is

select film_id from inventory_3 where film_id=3

This SQL statement, as you’ve seen before, will not have the effect of index lookup, because store_id is the left side of the index

But in mysql 8.0, this supports loose index scanning, which greatly reduces query time and is much faster than previous versions of mysql

Why is my index not working?

explain select * from film where film_id+1=100
Copy the code

For this query, for example, the optimizer will drop the index because it does a function on the index field

Because we index film_id, not film_id +1

Some people will say, your SQL statement is too weird, no one would write this in real life

Look at the lease form below

We would like to inquire about the lease information for May

select * from rental where month(rental_date)=5
Copy the code

Note that rental_date is indexed

Take a look at the implementation:

Look at the rows, there is no index effect

You can’t use indexes when you use the month function

How do I write this SQL statement? I had to use the index

This can be optimized with Bewtten

select * from rental where rental_date between '2021-5-1' and '2021-6-1'
Copy the code

Of course, you can concatenate the keyword or after, to find last May also.

select * from t1 where f1=6
Copy the code

Comparing a string to a number converts a string to a number

Because the f1 field is of type VARCHAR, the above SQL is equal to

select * from t1 where cast(f1 as signed int)=6
Copy the code

So we try to avoid that.

select * from t1 where f1='6'
Copy the code

Take a look at

The construction sentences are as follows:

create table `t1`(
    `f1` varchar(32) not null ,
    `f2` int not null ,
    key `idx_f1` (`f1`),
    key `idx_f2` (`f2`)
)engine =InnoDb  default charset = utf8;

create table `t2`(
    `f1` varchar(32) not null ,
    `f2` int not null ,
    key `idx_f1` (`f1`),
    key `idx_f2` (`f2`)
)engine =InnoDb  default charset = utf8mb4;
Copy the code

Then try using the following SQL:

select t2.* from t1,t2 where t1.f1=t2.f1 and t1.f2=6
Copy the code

T1.f2 =6 This query uses indexes

But t1.f1=t2.f1 does not

Be careful because utF8 and UTF8MB4 have implicit character encoding conversions in the query, which can cause index invalidation.

Convert (utF8MB4 is recommended)

select t2.* from t1,t2 where convert(t1.f1 using utf8mb4)=t2.f1 and t1.f2=6
Copy the code

Paging query efficiency is low, how to do?

Looking at the SQL statement above, it is actually executing the query on the left and then the limit statement in the box on the right

You can sense that this throws away a lot of data and is inefficient.

So how do you optimize?

Take a look at the index:

The film id is the primary index, the title is the primary index, and the DES is not the primary index

I can do it

Set a joint index to the title and DES fields

Is there any other way to optimize the SQL statement above?

Select f from film f inner join(select film_id from film order by title limit 900,10) m on f.film_id= m.film_idCopy the code

The primary key that gets the data you want first

You can concatenate the original table with the results above.