This is the 29th day of my participation in the Gwen Challenge in November. Check out the details: The last Gwen Challenge in 2021.”

❤️ About the author: Hello everyone, I am Xiao Xu Zhu. Java field quality creator 🏆, CSDN blog expert certification 🏆, Huawei Cloud enjoy expert certification 🏆

❤️ technology live, the appreciation

❤️ like 👍 collect ⭐ look again, form a habit


Project background

There are three mega data tables

Ex_subject_point 9,316 data points

Ex_question_junior 2,159,519 pieces of data have 45 fields

Ex_question_r_knowledge 3,156,155 pieces of data

Test database: mysql (5.7)

 

1. To optimize the query and avoid full table scan, first consider creating indexes on where and ORDER by columns.

Case Study:

SELECT ex_question_junior.QUESTION_ID 
FROM ex_question_junior 
WHERE ex_question_junior.GRADE_ID=1
Copy the code

Execution time: 17.609s (multiple executions, hovering around 17s)

Optimized: After the index is added to the GRADE_ID field

Execution time: 11.377s (multiple executions, hovering around 11s)

Note: What fields do we usually build indexes on?

This is a very complex topic that requires a thorough analysis of business and data. Both primary and foreign keys must have indexes. Other fields that need to be indexed must meet the following conditions:

A, the field appears in the query condition, and the query condition can use the index;

B. The execution frequency is high, more than thousands of times a day;

C. The set of records that can be filtered by field conditions is very small, so what is the appropriate proportion of data filtering?

This has no fixed value and needs to be evaluated according to the amount of table data. The following is an empirical formula for quick evaluation:

Small table (table with less than 10000 rows) : filter ratio <10%;

Large table :(number of records returned by filtering)<(total number of records in the table * length of single record)/10000/16

Single record Length ≈ Sum of average field length + Number of fields *2

The following is an empirical classification of whether a b-tree index is needed:

2. Avoid null values for fields in the WHERE clause, which will cause the engine to abandon the index and perform a full table scan

select id from t where num is null

It is best NOT to leave NULL to the database and populate the database with NOT NULL whenever possible.

Remarks, descriptions, comments, etc., can be set to NULL. Otherwise, it is best not to use NULL.

If a vARCHar is a variable length field, NULL takes up no space. If a vARCHar is a variable length field, NULL takes up no space. If a vARCHar is a variable length field, NULL takes up no space.

Select * from num where num is null; select * from num where num is null;

select id from t where num = 0

Case Study:

In mysql database, a full table scan will not be performed without an index.

SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior 
WHERE IS_USE is NULL
Copy the code

 

The execution time is 11.729 seconds

 

SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior 
WHERE IS_USE =0
Copy the code

The execution time is 12.253 seconds

The time is almost the same.

Avoid using it in where clauses! = or <> otherwise the engine will abandon the index for a full table scan.

Case Study:

Mysql > select ‘where’ from ‘where’; = or <> operators, the engine does not abandon the use of indexes.

EXPLAIN
SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior 
WHERE ex_question_junior.GRADE_ID ! =15
Copy the code

The execution time is 17.579 seconds

The execution time is 16.966 seconds

4. Avoid using OR in the WHERE clause to join conditions. If a field has an index and a field has no index, the engine will abandon the index and perform a full table scan instead

Case Study:

GRADE_ID is indexed, QUESTION_TYPE is not

The execution time is 11.661s

Optimization scheme:

Use union all to separate indexed and non-indexed fields. The index field takes effect

The execution time is 11.811 seconds

However, the non-index field is still query speed will be very slow, so the query conditions, can add index index as far as possible

5. Use in and not in with caution. Otherwise, a full table scan will occur

Case analysis

Note: Using the IN and not IN operators for index fields in the WHERE clause of the mysql database, the engine does not abandon the use of indexes.

Note: Using the IN and not IN operators for non-index fields in the WHERE clause of the mysql database will result in a full table scan.

Case Study 2:

Use the difference between between and in

SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE ex_question_junior.QUESTION_TYPE  IN(1.2.3.4)
Copy the code

The execution time is 1.082 seconds

SELECT ex_question_junior.QUESTION_ID
FROM ex_question_junior
WHERE ex_question_junior.QUESTION_TYPE between 1 and 4
Copy the code

 

The execution time is 0.924s

 

The Times are about the same

Case Study 3:

Use exists and in to distinguish: conclusion

Use exists and in to distinguish: conclusion

1. In () applies to the case where table B is larger than table A (A<B)

select * from A

where id in(select id from B)

2. Exists () applies to the case where table B has A smaller data than table A (A>B)

select * from A

where exists(

select 1 from B where B.id = A.id

)

3. If the data in table A is the same size as that in table B, the efficiency of in and EXISTS is similar. You can use either of them. grammar

select * from A

where id in(select id from B)

The ex_QUESTION_R_KNOWLEDGE table has a large amount of data, while the ex_SUBJect_POINT table has a small amount of data

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Ex_question_r_knowledge (A) has A large amount of data, while ex_Subject_Point (B) has A small amount of data (A>B)

Use the exists for

SELECT *
FROM ex_question_r_knowledge
WHERE ex_question_r_knowledge.SUBJECT_POINT_ID IN
(
    SELECT ex_subject_point.SUBJECT_POINT_ID
    FROM ex_subject_point
    WHERE ex_subject_point.SUBJECT_ID=7
)
Copy the code

 

SELECT *
FROM ex_question_r_knowledge
WHERE  exists
(
    SELECT 1
    FROM ex_subject_point
    WHERE ex_subject_point.SUBJECT_ID=7
    AND ex_subject_point.SUBJECT_POINT_ID = ex_question_r_knowledge.SUBJECT_POINT_ID
)
Copy the code

 

The execution time is 13.537 seconds

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Ex_subject_point table has A small amount of data (A), ex_QUESTION_R_knowledge (B) has A large amount of data (A<B)

Use the in

SELECT *
 FROM ex_subject_point 
WHERE
 ex_subject_point.SUBJECT_POINT_ID IN( SELECT 
ex_question_r_knowledge.SUBJECT_POINT_ID FROM 
ex_question_r_knowledge WHERE 
ex_question_r_knowledge.GRADE_TYPE=2 )
Copy the code

 

 

SELECT * FROM ex_subject_point WHERE
 ex_subject_point.SUBJECT_POINT_ID IN( SELECT 
ex_question_r_knowledge.SUBJECT_POINT_ID FROM 
ex_question_r_knowledge WHERE 
ex_question_r_knowledge.GRADE_TYPE=2 )
Copy the code

The execution time is 1.554 seconds

SELECT *
    FROM ex_subject_point
    WHERE  exists(
    SELECT ex_question_r_knowledge.SUBJECT_POINT_ID
    FROM ex_question_r_knowledge
    WHERE ex_question_r_knowledge.GRADE_TYPE=2
    AND ex_question_r_knowledge.SUBJECT_POINT_ID= ex_subject_point.SUBJECT_POINT_ID
)
Copy the code

The execution time is 11.978 seconds

6, like fuzzy full match will also result in full table scan

Case analysis

EXPLAIN
 SELECT * 
FROM ex_subject_point 
WHERE ex_subject_point.path like"% % / 11 /"Copy the code

 

For efficiency, consider full-text retrieval. Lucene. Or another NOSQL database that can provide full-text indexing, such as TT Server or MongoDB

It’s going to be updated in a few more sections.

 

Last night, I had a whim, like fuzzy full match, will lead to a full table scan, then after fuzzy match and before fuzzy match will also be full table scan?

I turned on the computer today and took a test. The results are as follows:

Like fuzzy matching does not result in full table scan

Before matching like ambiguity, full table scan is performed

The principle of MY SQL is such, LIKE fuzzy full matching will lead to index failure, full table scan; Matching before LIKE fuzzy will also cause index failure and perform full table scan. But LIKE blur matches, index will have effect.