MySql series: simple summary | of query optimization in August more challenges

This is the sixth day of my participation in the August More text Challenge. For details, see:August is more challenging

📖 preface

🙉 to enhance their vision and contact more friends

Sometimes the circle is very important, like the school period we will find that there are such a class of students 👩🎓 do not listen to lectures, but is good at learning. What if he were to go home and not be in class? Similar circles; Libraries, Internet cafes, bike groups, tech groups, etc., can all help you grow by bringing skills shared by others with similar interests or by creating an atmosphere together.

Why do you want to optimize MySql?

How can I put this, as anyone who has worked on a medium or larger project knows, a MySql execution performance is a matter of stability and performance of your system.

🎈 The four characteristics of the database, the isolation level of the database

Say a platitude topic first, also be the thing that interview likes to ask most, hope the share of the blogger is helpful to you now!

Four features of 🚀 database:

Atomicity: Atomicity means that all operations contained in a transaction either succeed or fail to roll back.

Consistency: A transaction must be in a consistent state before and after execution.

Isolation: Isolation means that when multiple users concurrently access the database, for example, when they operate on the same table, the transactions opened by the database for each user cannot be disturbed by the operations of other transactions. Multiple concurrent transactions must be isolated from each other.

Durability: Durability means that once a transaction is committed, changes to the data in the database are permanent.

🐱🏍 Isolation level of the database:

Serializable: Avoids dirty reads, non-repeatable reads, and fantasy reads.

Repeatable Read: Avoids dirty read and non-repeatable read.

Read COMMITTED: Avoid dirty reads.

Read Uncommitted: The lowest level, which is not guaranteed in any case.

👏 Database design paradigm:

What is a paradigm: In short, the design of a database has a lot to do with how well the data is stored and what the developer does with it. So the establishment of scientific, standard database is to meet some specifications to optimize data storage. In a relational database, these specifications are called patterns.

What are the three paradigms:

First normal form: the relational schema R is said to satisfy the first normal form when none of its properties can be decomposed into more basic units of data, abbreviated to 1NF. Satisfying the first normal form is the minimum requirement for the normalization of the relational schema; otherwise, many basic operations cannot be implemented in such a relational schema.

Second normal form: if the relational mode R satisfies the first normal form and all non-primary attributes of R are completely dependent on each candidate key attribute of R, R is said to satisfy the second normal form, abbreviated to 2NF.

Third normal form: Let R be a relational pattern satisfying the conditions of first normal form and X be any set of attributes of R. If X is nontransitively dependent on any of the candidate keywords of R, R is said to satisfy the third normal form, abbreviated to 3NF.

Note: A relationship is essentially a two-dimensional table where each row is a tuple and each column is an attribute

First normal form:

Each column attribute is a non-divisible attribute value, ensuring the atomicity of each column

The attributes of the two columns are similar or similar. Try to merge the columns with the same attributes to ensure that no redundant data is generated.

Second normal form:

Each row of data can only be related to one of its columns, that is, a row of data can only do one thing. Whenever there is duplication of data in a data column, the table should be split.

One piece of data to do one thing without complicated relational logic. At the same time, the update and maintenance of table data are easier to operate.

Third normal form:

Data cannot have a transitive relationship, that is, no property is directly related to the primary key rather than indirectly related to it. Attributes such as a–>b–> C containing such a relationship are not consistent with the third normal form.

Such as:

Student list (Student ID, Name, age, gender, College, Address, Phone number)

Such a table structure, the relationship exists. Student Number –> University –> (University address, university telephone)

For such a table structure, we should break it down as follows.

(Student ID, Name, Age, gender, Institution) — (Institution, address, Telephone number)

The three paradigms are just the basic ideas of designing databases in general, which can build databases with small redundancy and reasonable structure. If there are special cases, we should treat them as special cases. The most important thing in database design is to look at requirements and performance, requirements > Performance > table structure. Therefore, we should not blindly pursue the paradigm to establish a database, everyone should remember oh!

✨ Optimization suggestions for MySql

Note :(try to reduce the occurrence of SELECT * below please ignore, use a specific list of fields instead of *, do not return any fields not used, using fields can greatly save data transfer, and database memory usage oh.)

🤳 Highlights:

EXPLAIN:

To do MySQL optimization first, we need to use EXPLAIN to look at the SQL execution plan.

Type column, connection type. A good SQL statement should be at least range level. The all level is not allowed.

Key column, the name of the index used. If no index is selected, the value is NULL. You can use mandatory indexing.

Key_len column, index length.

Rows column, number of rows scanned. This value is an estimate.

Extra column, detailed. Note the common less friendly values, such as Using filesort, Using TEMPORARY.

The SELECT statement must specify the field name:

SELECT* adds a lot of unnecessary consumption (CPU, IO, memory, network bandwidth);

Increased the possibility of using overridden indexes; When the table structure changes, the forebreak also needs to be updated.

So you need to directly follow SELECT with the field name.

➤ Common SQL practices

The index cannot be used for a negative query

SELECT * FROM order WHERE order_status! =0 AND order_status ! =1 Not in/not existsCopy the code

SELECT * FROM order WHERE order_status in (1,2,3,4,5)

SQL statements should not contain too many values IN:

MySQL optimizes IN by storing all constants IN an array, and the array is sorted. But if the value is more, the consumption is also relatively large. SELECT id FROM t WHER num IN(1,2,3) SELECT id FROM t WHER num IN(1,2,3)

If the sort field does not use an index, sort as little as possible

Try to use union all instead of union

The main difference between union and Union ALL is that the former needs to combine the result sets before performing unique filtering operations, which involves sorting, increasing a large number of CPU operations, increasing resource consumption and delay. The prerequisite for union ALL is that there is no duplicate data between the two result sets.

Leading fuzzy queries cannot use indexes (this will invalidate indexes and result in full table scans)

SELECT * from order WHERE order_desc LIKE ‘%XX’

Non-leading fuzzy queries can :(full table scan without invalidating indexes)

SELECT * FROM order WHERE order_desc LIKE ‘XX%’

Do not use indexes for fields with low data differentiation

SELECT * FROM user WHERE user_sex=1 (SELECT * FROM user WHERE user_sex=1)

You can use indexes as long as you can filter 80% of the data. For order status, an index should not be used if the status value is small, and an index should be created if the status value is large enough to filter a large amount of data.

An index cannot be hit when calculating on a property

SELECT * FROM order WHERE YEAR(date) < = ‘2019’

Date (date, date, date, date, date, date, date)

SELECT * FROM order WHERE date < = CURDATE() SELECT * FROM order WHERE date < = ‘2019-05-08’

😜 is not a well-known SQL practice

Segmented query :(you can use segmented query when the number of rows scanned is in the millions)

In some user selection pages, some users may select a large time range, resulting in slow query. The reason is that there are too many rows to scan. At this time, we can use the program to perform segmented query, loop traversal, and combine the results for display.

Avoid null values for fields in the WHERE clause

A null determination causes the engine to abandon the index and perform a full table scan.

Avoid implicit conversions

Type conversion occurs when the type of column in the WHERE clause is inconsistent with the type of the parameter passed in. It is recommended to determine the parameter type in WHERE first

Avoid expression operations on fields in the WHERE clause. For example:

SELECT * FROM user WHERE age*2=36; This causes the engine to abandon the use of indexes,

SELECT * FROM user WHERE age=36/2;

Note the range query statement

For a federated index, a range query, such as between, >, and <, invalidates subsequent index fields.

If most services are single queries, Hash indexes, such as the user center, provide better performance

SELECT * FROM user WHERE user_id=?
SELECT * FROM user WHERE login_name=?
Copy the code

The reason:

The time of b-tree index is LOG (n). The time of Hash index is O(1).

Columns that are null are allowed, and the query has potential pits

SELECT * FROM user WHERE user_name! SELECT * FROM user WHERE user_name! SELECT * FROM user WHERE user_name! = ‘Chenyongjia’ If name is allowed to be NULL and the index does not store null values, the result set will not contain these records.

Therefore, use the NOT NULL constraint and the default value.

The left-most prefix of a compound index is not a value. The ORDER of WHERE in SQL statements must be the same as that of the compound index

User center creates composite index (login_name, passwd)

SELECT * FROM user WHERE login_name=? AND user_passwd=?
SELECT * FROM user WHERE user_passwd=? AND login_name=?
Copy the code

Can hit the index

SELECT * FROM user WHERE login_name=?
Copy the code

It also matches the index and satisfies the leftmost prefix of the composite index

SELECT * FROM user WHERE user_passwd=?
Copy the code

The index cannot be matched because the leftmost prefix of the composite index is not met

Use enums instead of strings

The ENUM saves TINYINT. Do not use “China”, “Shanghai”, “technology” in the ENUM. The string space is large and the efficiency is low.

💕 A niche but very useful SQL practice

Look tired! Don’t slack off and follow the blogger.

LIMIT 1 is efficient if it is clear that only one result will be returned, in order to make the type column in EXPLAIN const

SELECT * FROM user WHERE login_name=?
Copy the code

It can be optimized as:

SELECT * FROM user WHERE login_name=? LIMIT 1
Copy the code

The reason:

You know there’s only one result, but the database doesn’t know that, so it’s explicitly told to stop the cursor actively

In addition to saving data CPU, there are unexpected query cache optimizations that can be achieved by moving computing to the business tier rather than the database tier

SELECT * FROM order WHERE date < = CURDATE()

This is not a good SQL practice and should be optimized for:

CurDate = date (‘ Y – m – d ‘); curDate = date(‘Y-m-d’); CurDate = date (‘ Y – m – d ‘); res = mysql_query( ‘SELECT * FROM order WHERE date < = $curDate’);

The reason:

The database CPU is freed

The query cache can be used only when the incoming SQL is the same

Casts will scan the full table

SELECT * FROM user WHERE user_phone=15638589820

Did you think it would hit the phone index? Big mistake. How do I change this statement?

Note: the above SQL statements as far as possible all caps do not ask me why ha ha!

MySQL cannot use indexes

(2) MySQL cannot use indexes when Join condition field types are inconsistent in Join statements

In the case of compound indexes, if the query condition does not include the leftmost part of the index column, that is, the leftmost prefix rule is not met, the index is not used

If mysql estimates that using an index scan is slower than using a full table scan, then do not use indexes. (Scan data more than 30%, will go to the full table)

The like query starts with %

If the column type is a string, you will not use the index. If the column type is a string, you will not use the constant value in the where condition. Otherwise, MySQL will not use the constant value even if the column has an index

If the column in the condition before or has an index and the column in the condition behind does not have an index, the indexes involved will not be used

🎉 summary:

The optimization of MySql needs to be used in the actual development process to summarize.
I hope you can stand on the shoulders of predecessors and walk higher and farther!
If you like me, remember to follow and like me!

MySql series: simple summary | of query optimization in August more challenges

📖 preface

🎈 The four characteristics of the database, the isolation level of the database

Four features of 🚀 database:

🐱🏍 Isolation level of the database:

👏 Database design paradigm:

First normal form:

Second normal form:

Third normal form:

Such as:

✨ Optimization suggestions for MySql

🤳 Highlights:

➤ Common SQL practices

😜 is not a well-known SQL practice

💕 A niche but very useful SQL practice

MySQL cannot use indexes

🎉 summary:

Related Posts

Deep learning some experience and Suggestions of | “todo” v. Not To do

Quickly create project templates based on existing projects

Want to see Dubbo source code? I suggest you read this one first