Writing good SQL is essential

The project that the blogger is responsible for mainly uses MySQL database of Ali Cloud. Recently, slow SQL alarms frequently appear, and the longest execution time is as high as 5 minutes. After exporting the logs, the main reasons are no index hit and no pagination. In fact, this is a very elementary mistake, I can not help but chill, team members need to improve the technical level of ah. The transformation of these SQL process, summed up some experience to share with you, if there are mistakes welcome criticism.

MySQL performance

Maximum data volume

Regardless of the amount of data and concurrency, performance is rogue. MySQL does not limit the maximum number of records in a table, it depends on the operating system’s file size limit.

Performance is determined by comprehensive factors. Regardless of service complexity, hardware configuration, MySQL configuration, data table design, and index optimization are in the order of influence. The value of 5 million is for reference only, not an iron rule.

Bloggers have operated on a single table with more than 400 million rows of data, and it takes 0.6 seconds to query the latest 20 records in pages. Select field_1,field_2 from table where id < #{prePageMinId} order by ID desc LIMIT 20 PrePageMinId is the minimum ID for the previous data record.

While the query speed was fine at the time, it will surely become overwhelmed as the data continues to grow. Dividing database and table is a big job with a long period and high risk. It should be optimized on the current structure as much as possible, such as upgrading hardware, migrating historical data, etc., and then dividing it when there is no way out. Those who are interested in sub-tables can read the basic idea of sub-tables.

Maximum concurrency

The number of concurrent requests is how many requests the database can process at any one time, as determined by max_connections and max_user_connections. Max_connections specifies the maximum number of connections to a MySQL instance. The upper limit is 16384. Max_user_connections specifies the maximum number of connections to a database user.

MySQL provides buffers for each connection, which means more memory consumption. If the connection number is set too high for the hardware, too low for the hardware. Generally, the ratio of the two is more than 10%, and the calculation method is as follows:

Max_used_connections/max_connections *100% = 3/100 *100% ≈ 3%Copy the code

To view the maximum number of connections and the maximum number of response connections:

show variables like '%max_connections%'; show variables like '%max_user_connections%';Copy the code

Modify the maximum number of connections in the configuration file my.cnf

[mysqld]max_connections = 100max_used_connections = 20
Copy the code

The query takes 0.5 seconds

You are advised to limit the query time to less than 0.5 seconds, which is an empirical value derived from the three-second rule in user experience. If the user does not respond to an operation within 3 seconds, the user will be bored or even quit. Response time = client UI rendering time + network request time + application processing time + query database time, 0.5 seconds is 1/6 of the processing time left for the database.

Implementation of the principle of

Compared to NoSQL databases, MySQL is a delicate creature. It is like the female students in PE class, a little dispute and classmates will be uncomfortable (difficult to expand), run two steps will be out of breath (small capacity and low), often sick to ask for leave (SQL constraints too much). Everyone is a bit distributed these days, and application scaling is much easier than database scaling, so the principle is less database work, more application work.

Make the most of indexes, but don’t abuse them. Indexes also consume disk and CPU.
It is not recommended to use database functions to format data for application processing.
The use of foreign key constraints is not recommended; data accuracy is guaranteed by the application.
Unique indexes are not recommended in the scenario where many data is written but few data is read.
With appropriate redundancy of fields, try creating intermediate tables, calculating intermediate results with applications, and trading space for time.
Extremely time-consuming transactions are not allowed to be executed and the application is split into smaller transactions.
Anticipate load and data growth trends for important data sheets (such as order sheets) and optimize them in advance.

Data table design

The data type

Choice of data type: simpler or less space.

If the length is sufficient, use tinyint, Smallint, medium_int instead of int.
If the length of the string is specified, use the char type.
If the vARCHar meets this requirement, the text type is not used.
For high precision, use decimal or BIGINT, such as multiplying two decimal places by 100 and saving.
Use timestamp instead of datetime whenever possible.

Timestamp takes up less space than datetime and is stored in THE AUTOMATIC time zone conversion format UTC.

Avoid empty value

MySQL still occupies space when a field is NULL, which makes indexes and index statistics more complicated. Updates from NULL values to non-NULL values cannot be performed in situ, which is prone to index splitting and affects performance. If possible, replace the NULL value with a meaningful value to avoid the SQL statement containing is not NULL.

Text type optimization

Because the text field stores a large amount of data, the table size can increase very early, affecting the query performance of other fields. It is recommended to extract it and place it in a sub-table and associate it with the business primary key.

The index optimization

The index classification

Plain index: The most basic index.
Composite index: An index created on multiple fields to speed up the retrieval of composite query conditions.
Unique index: Similar to a normal index, but the value of the index column must be unique, allowing empty values.
Combination unique index: The combination of column values must be unique.
Primary key index: a special unique index that uniquely identifies a record in a data table. Empty values are not allowed and are usually constrained by the primary key.
Full text index: Used to query massive text. InnoDB and MyISAM support full text index after MySQL5.6. Due to poor query accuracy and scalability, more enterprises choose Elasticsearch.

The index optimization

Paging queries are important, MYSQL does not use indexes if the volume of the query exceeds 30%.
The number of indexes in a single table does not exceed five, and the number of index fields does not exceed five.
The string can be indexed by prefix, and the prefix length is limited to 5-8 characters.
The uniqueness of the field is too low, so it is meaningless to add indexes, such as whether to delete, gender.

Fair use coverage index, as follows:

select login_name, nick_name from member where login_name = ?
Copy the code

Nick_name and login_name create a combined index faster than login_name.

SQL optimization

batch

When the blogger was young, he saw a fish pond with a small opening to release water, and there were all kinds of floating objects on the surface. Duckweed and leaves always make it through the outlet, while branches block other objects and sometimes get stuck and need to be cleaned manually. MySQL is the fish pond, maximum concurrency and network bandwidth is the outlet, user SQL is floating.

Queries without paging parameters, or update and DELETE operations that affect a large amount of data, are branches that we need to break up and batch, for example:

Service Description: Update all expired coupons to be unavailable.

The SQL statement:

update status=0 FROM `coupon` WHERE expire_date <= #{currentDate} and status=1;
Copy the code

If a large number of coupons need to be updated to the unavailable state, executing this SQL may block other SQL, batch processing pseudocode as follows:

int pageNo = 1; int PAGE_SIZE = 100; while(true) { List<Integer> batchIdList = queryList('select id FROM `coupon` WHERE expire_date <= #{currentDate} and status = 1 limit #{(pageNo-1) * PAGE_SIZE},#{PAGE_SIZE}'); if (CollectionUtils.isEmpty(batchIdList)) { return; } update('update status = 0 FROM `coupon` where status = 1 and id in #{batchIdList}') pageNo ++; }Copy the code

The <> operator optimizes

In general, the <> operator cannot use the index, as in the following example, to query an order less than $100:

select id from orders where amount ! = 100;Copy the code

If the number of orders for 100 is very small and the data distribution is very uneven, it is possible to use indexes. In view of this uncertainty, union is adopted to aggregate the search results, and the rewriting method is as follows:

(select id from orders where amount > 100) union all(select id from orders where amount < 100 and amount > 0)
Copy the code

The OR optimization

Or cannot use composite indexes under Innodb engine, for example:

Select id, product_name from orders where mobile_no = '13421800407' or user_id = 100;Copy the code

OR could not match the combined index mobile_no + user_id.

(select id, product_name from orders where mobile_no = '13421800407') product_name from orders where user_id = 100);Copy the code

When both id and product_name fields are indexed, the query is most efficient.

IN optimization

IN is suitable for the main table big small table, EXIST is suitable for the main table small table big table. Due to the continuous upgrading of the query optimizer, there are many scenarios where the performance is almost the same.

Try changing to join query as follows:

select id from orders where user_id in (select id from user where level = 'VIP');
Copy the code

JOIN is adopted as follows:

select o.id from orders o left join user u on o.user_id = u.id where u.level = 'VIP';
Copy the code

I don’t do column operations

The index is usually invalidated in the query condition column, as shown below:

Query order of the day

Select id from order WHERE date_format(create_time, '%Y-%m-%d') = '2019-07-01'; select id from order WHERE date_format(create_time, '%Y-%m-%d') = '2019-07-01';Copy the code

The date_format function causes this query to be unable to use the index.

select id from order where create_time between '2019-07-01 00:00:00' and '2019-07-01 23:59:59';
Copy the code

Avoid to Select all

If you do not query all columns in the table, avoid using SELECT *, which does a full table scan and does not make efficient use of the index.

Like to optimize

Like is used for fuzzy queries, for example (field is indexed) :

SELECT column FROM table WHERE field like '%keyword%';
Copy the code

The query did not match the index.

SELECT column FROM table WHERE field like 'keyword%';
Copy the code

Removing the previous % query will hit the index, but does the product manager have to blur the match before and after? Fulltext can try, but Elasticsearch is the ultimate weapon.

The Join optimization

Join is implemented by the Nested Loop join algorithm, which drives the result set of the table as the basic data, uses the junction data as the filtering condition to query the data in the next table, and then merges the results. If there are multiple joins, the previous result set is used as circular data and the data is queried again in the latter table.

The driven and driven tables should increase the query conditions as much as possible, satisfy the condition of ON and use less Where, and use small result sets to drive large result sets. If an index is added to the join field of the driven table, set an adequate join Buffer Size if the index cannot be established. Do not join more than three tables and try to add redundant fields.

Limit optimization

Limit is used for paging query. The performance deteriorates if you scroll backwards. The solution is as follows:

Select * from Orders ORDER by id desc LIMIT 1,10Copy the code

Takes 0.4 seconds

Select * from Orders ORDER by id desc LIMIT 1,10Copy the code

Takes 5.2 seconds

Select ID first to narrow the query scope, written as follows:

Select * from orders where id > (select id from orders by id desc limit 0,10) order by id desc limit 0,10Copy the code

Takes 0.5 seconds

If only the primary key ID is displayed, the following formula is used:

select id from orders where id between 1000000 and 1000010 order by id desc
Copy the code

Takes 0.3 seconds

What if the above solution is still slow? If you are interested in JDBC, you can use a cursor to implement paging queries

Other databases

As a back-end developer, it is important to be proficient in MySQL or SQL Server as the storage core, and also to take an active interest in NoSQL databases, which are mature enough and widely adopted to solve performance bottlenecks in specific scenarios.

reference

www.jianshu.com/p/6864abb4d…

MySQL performance

Maximum data volume

Regardless of the amount of data and concurrency, performance is rogue. MySQL does not limit the maximum number of records in a table, it depends on the operating system’s file size limit.

Maximum concurrency

The query takes 0.5 seconds

Implementation of the principle of

Data table design

The data type

Avoid empty value

Text type optimization

The index optimization

The index classification

The index optimization

SQL optimization

batch

The <> operator optimizes

The OR optimization

IN optimization

I don’t do column operations

Avoid to Select all

Like to optimize

The Join optimization

Limit optimization

Other databases

reference

Related Posts

Knowledge integration – a primer on MySQL is enough

Docker deploys the SpringBoot project

Introduction to NIO small Java Demo