A SAO operation, I improve the SQL execution efficiency of 10000000 times!

Welcome to follow our wechat official account: Shishan100

My new course ** “C2C e-commerce System Micro-service Architecture 120-day Practical Training Camp” is online in the public account ruxihu Technology Nest **, interested students, you can click the link below for details:

120-Day Training Camp of C2C E-commerce System Micro-Service Architecture

Author: Wind has no trace – Tang

scenario

I use the database mysql5.6, the following is a brief introduction to the scenario

The curriculum:

create table Course(

c_id int PRIMARY KEY,

name varchar(10)

)
Copy the code

Article 100 the data

The student table:

create table Student(

id int PRIMARY KEY,

name varchar(10)

)
Copy the code

Article 70000 the data

Student transcript SC

CREATE table SC(

    sc_id int PRIMARY KEY,

    s_id int,

    c_id int,

    score int

)
Copy the code

Article 70 w

Search objective: Find candidates with 100 marks in Chinese test

The query

select s.* from Student s where s.s_id in (select s_id from SC sc where sc.c_id = 0 and sc.score = 100 )
Copy the code

Execution time: 30248.271s

Dizzy, why so slow, first to check the query plan:

EXPLAIN select s.* from Student s where s.s_id in (select s_id from SC sc where sc.c_id = 0 and sc.score = 100 )
Copy the code

Select * from ‘where’; select * from ‘where’; select * from ‘where’;

Select * from sc where c_id = score

CREATE index sc_c_id_index on SC(c_id);
CREATE index sc_score_index on SC(score);
Copy the code

Execute the above query statement again in 1.054 seconds

Fast 3W times, greatly shorten the query time, it seems that the index can greatly improve the query efficiency, it is necessary to build an index.

Most of the time, I forgot to build the index. When the amount of data is small, I don’t feel at all. This optimization feels great.

But 1s is still too long, can we still optimize it? Take a closer look at the execution plan:

SELECT
    `YSB`.`s`.`s_id` AS `s_id`,
    `YSB`.`s`.`name` AS `name`
FROM
    `YSB`.`Student` `s`
WHERE
    < in_optimizer > (
        `YSB`.`s`.`s_id` ,< EXISTS > (
            SELECT
            FROM
                `YSB`.`SC` `sc`
            WHERE
                (
                    (`YSB`.`sc`.`c_id` = 0)
                    AND (`YSB`.`sc`.`score` = 100)
                    AND (
                        < CACHE > (`YSB`.`s`.`s_id`) = `YSB`.`sc`.`s_id`
                    )
                )
        )
    )
Copy the code

How to view the optimized statement?

Execute in the command window

A type = all

In my previous thinking, the order of execution of this SQL would be to execute the subqueries first

select s_id from SC sc where sc.c_id = 0 and sc.score = 100
Copy the code

Time: 0.001 s

The results are as follows:

And then execute

Select s.* from Student where s. _id in(7,29,5000)Copy the code

Time: 0.001 s

EPENDENT SUBQUERY (SQL, SUBQUERY, SUBQUERY, SUBQUERY, SUBQUERY, SUBQUERY, SUBQUERY)

What about joining queries instead?

SELECT s.* from 

Student s

INNER JOIN SC sc

on sc.s_id = s.s_id

where sc.c_id=0 and sc.score=100
Copy the code

In order to re-analyze the connection query, delete the indexes sc_C_ID_INDEX and sc_score_index temporarily

The execution time is 0.057 seconds

Efficiency has improved. Look at the execution plan:

Select * from sc where s_id = 1

CREATE index sc_s_id_index on SC(s_id);

show index from SC

A join query is being executed

Time: 1.076s, unexpectedly the time also became longer, what is the reason? View the execution plan:

The optimized query statement is as follows:

SELECT
    `YSB`.`s`.`s_id` AS `s_id`,
    `YSB`.`s`.`name` AS `name`
FROM
    `YSB`.`Student` `s`
JOIN `YSB`.`SC` `sc`
WHERE
    (
        (
            `YSB`.`sc`.`s_id` = `YSB`.`s`.`s_id`
        )
        AND (`YSB`.`sc`.`score` = 100)
        AND (`YSB`.`sc`.`c_id` = 0)
    )
Copy the code

SQL > select * from ‘where’

Back to the previous execution plan:

SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL > alter table SQL

Normally, join is performed before WHERE filtering, but in our case, if join is performed first, 70W data will be sent to join for operation, so it is wise to perform WHERE filtering first

Now in order to eliminate mysql query optimization, I write an optimized SQL

SELECT
    s.*
FROM
    (
        SELECT
            *
        FROM
            SC sc
        WHERE
            sc.c_id = 0
        AND sc.score = 100
    ) t
INNER JOIN Student s ON t.s_id = s.s_id
Copy the code

That is, filter sc tables first and then join sc tables. The execution time is 0.054 seconds

SQL > create index s_ID (s_ID);

Extracting sc first and then connecting tables is much more efficient. The problem now is that scanning tables appear when extracting SC, so it is clear that relevant indexes need to be established

CREATE index sc_c_id_index on SC(c_id); CREATE index sc_score_index on SC(score);Copy the code

Then execute the query:

SELECT
    s.*
FROM
    (
        SELECT
            *
        FROM
            SC sc
        WHERE
            sc.c_id = 0
        AND sc.score = 100
    ) t
INNER JOIN Student s ON t.s_id = s.s_id
Copy the code

Execution time: 0.001s, this time is quite reliable, 50 times faster

Execution Plan:

We’ll see that the index is used to extract the SC and then the table.

So let’s do the SQL again

SELECT s.* from 

Student s

INNER JOIN SC sc

on sc.s_id = s.s_id

where sc.c_id=0 and sc.score=100
Copy the code

Execution time: 0.001s

Execution Plan:

SQL > select * from ‘where’; SQL > select * from ‘where’; SQL > select * from ‘where’;

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

(I am a gorgeous dividing line)

Recently, we re-imported some production data. After testing, we found that the execution efficiency of the SQL optimized a few days ago became low again

Adjusted content for SC table data increased to 300W, student scores more discrete.

A quick recap:

show index from SC

Execute SQL

SELECT s.* from 

Student s

INNER JOIN SC sc

on sc.s_id = s.s_id

where sc.c_id=81 and sc.score=84
Copy the code

Execution time: 0.061s, this time is a bit slow

Execution Plan:

Intersect union operation is used here, that is, union of results retrieved by two indexes at the same time, and then look at the distinction between field score and c_id.

From a single field, the degree of differentiation is not very large. The retrieval result of C_id =81 from SC table is 70001, and the retrieval result of score=84 is 39425.

However, the result of C_id =81 and score=84 is 897, that is, these two fields have a relatively high degree of differentiation when combined, so the query efficiency of establishing joint index will be higher.

From another point of view, the data in this table is 300w, and there will be more in the future. In terms of index storage, it is not a small amount. As the data amount increases, indexes cannot be loaded into the memory, but must be read from disk.

Therefore, it is necessary to create a multi-column federated index based on the specific business situation, so let’s try it out.

alter table SC drop index sc_c_id_index;
alter table SC drop index sc_score_index;
create index sc_c_id_score_index on SC(c_id,score)
Copy the code

The elapsed time of executing the above query statement is 0.007s, which is acceptable

Execution Plan:

The optimization of this statement is over for now

Conclusion:

Mysql nested subqueries are really inefficient
It can be optimized for join queries
When joining a table, you can filter the table with a WHERE condition and then join the table.
Set up appropriate indexes, and set up multi-column federated indexes if necessary
Analyze SQL execution plans. Mysql optimizes SQL, so it is important to analyze SQL execution plans

The index optimization

How to optimize subqueries, and how to build indexes, and create a single index for each field when multiple fields are indexed

Later, it is found that in fact, it is more efficient to establish joint index, especially in the case of large data volume and low differentiation of single column.

Single index

The query statement is as follows:

select * from user_test_copy where sex = 2 and type = 2 and age = 10
Copy the code

Index:

CREATE index user_test_index_sex on user_test_copy(sex);
CREATE index user_test_index_type on user_test_copy(type);
CREATE index user_test_index_age on user_test_copy(age);
Copy the code

Sex, Type and age fields are indexed respectively, the data volume is 300W, and the query time is 0.415s

Execution Plan:

Found type = index_merge

This is mysql’s optimization for multiple single-column indexes with the INTERSECT union operation on the result set

The column index more

We can create multiple column indexes on these three columns and copy the table for testing purposes

create index user_test_index_sex_type_age on user_test(sex,type,age);
Copy the code

Query statement:

select * from user_test where sex = 2 and type = 2 and age = 10
Copy the code

Execution time: 0.032s, more than 10 times faster, and the higher the differentiation of multi-column indexes, the faster the improvement

Execution Plan:

The most left prefix

Multi-column indexes also have a left-most prefix, so execute the following statement:

select * from user_test where sex = 2
select * from user_test where sex = 2 and type = 2
select * from user_test where sex = 2 and age = 10
Copy the code

That is, the first field in the index, sex, should appear in the WHERE condition

Indexes cover

In this way, when obtaining the result set, you do not need to obtain the data of other columns from disk, and directly return the index data, such as:

select sex,type,age from user_test where sex = 2 and type = 2 and age = 10
Copy the code

Execution time: 0.003s, much faster than fetching all fields

The sorting

select * from user_test where sex = 2 and type = 2 ORDER BY user_name
Copy the code

Time: 0.139 s

Indexing sorted fields improves sorting efficiency

create index user_name_index on user_test(user_name)
Copy the code

Finally, I have attached some summary of SQL tuning, which I can delve into later:

Column types should be defined as numeric types and as short as possible, such as primary and foreign keys, type fields, and so on
Create a single-column index
Create multi-column federated indexes as needed

When there is a lot of data after a single column is filtered, indexes are inefficient, that is, columns are less differentiated
If you index multiple columns, the multiple columns are much more differentiated, resulting in significant efficiency gains.

Create coverage indexes based on service scenarios to query only the fields required by services. If these fields are covered by indexes, the query efficiency is greatly improved
Indexes must be created on fields of multiple table joins to greatly improve the efficiency of table joins
An index is required on the WHERE condition field
An index is required on the sort field
Indexes need to be created on grouped fields
Do not use operations on Where conditions to avoid index invalidation

END

Source: www.cnblogs.com/tangyanbo/p…

The copyright of this article belongs to the author

Personal public account: Architecture Notes of Huishania (ID: Shishan100)

Welcome to long press the picture below to pay attention to the public number: Huoia architecture notes!

The official number backstage replies the information, obtains the author exclusive secret system study material

Architecture notes, BAT architecture experience taught each other

A SAO operation, I improve the SQL execution efficiency of 10000000 times!

Author: Wind has no trace – Tang

scenario

The index optimization

Single index

The column index more

Indexes cover

The sorting

Related Posts

Problems with Writing Go code [Issue 2]

SpringBoot+Mybatis configures multi-data source and transaction scheme

Git cherry-pick and Git rebase