The blogger (coder expert) is responsible for the project mainly using aliyun database MySQL. Recently, there are frequent slow SQL alarms, and the longest execution time is as high as 5 minutes. After exporting the logs, the main reasons are no index hit and no pagination.

In fact, this is a very elementary mistake, I can not help but chill, team members need to improve the technical level of ah. The transformation of these SQL process, summed up some experience to share with you, if there are mistakes welcome criticism.

MySQL performance

Maximum data volume

Regardless of the amount of data and concurrency, performance is rogue. MySQL does not limit the maximum number of records in a table, it depends on the operating system’s file size limit.

According to The Alibaba Java Development Manual, only when the number of rows in a single table exceeds 5 million or the capacity of a single table exceeds 2GB is it recommended to divide database into tables. Performance is determined by comprehensive factors. Regardless of service complexity, hardware configuration, MySQL configuration, data table design, and index optimization are in the order of influence. The value of 5 million is for reference only, not an iron rule. Bloggers have operated on a single table with more than 400 million rows of data, and it takes 0.6 seconds to query the latest 20 records in pages. Select field_1,field_2 from table where id < #{prePageMinId} order by ID desc LIMIT 20 PrePageMinId is the minimum ID for the previous data record. While the query speed was fine at the time, it will surely become overwhelmed as the data continues to grow. Dividing database and table is a big job with a long period and high risk. It should be optimized on the current structure as much as possible, such as upgrading hardware, migrating historical data, etc., and then dividing it when there is no way out. Those who are interested in sub-tables can read the basic idea of sub-tables.

Maximum concurrency

The number of concurrent requests is how many requests the database can process at any one time, determined by maxConnections and MaxUserConnections. Maxconnections specifies the maximum number of connections per database user. Maxuserconnections specifies the maximum number of connections per database user. MySQL provides buffers for each connection, which means more memory consumption. If the connection number is set too high for the hardware, too low for the hardware. Generally, the ratio of the two is more than 10%, and the calculation method is as follows:

To view the maximum number of connections and the maximum number of response connections:

Modify the maximum number of connections in the configuration file my.cnf

The query takes 0.5 seconds

You are advised to limit the query time to less than 0.5 seconds, which is an empirical value derived from the three-second rule in user experience. If the user does not respond to an operation within 3 seconds, the user will be bored or even quit. Response time = client UI rendering time + network request time + application processing time + query database time, 0.5 seconds is 1/6 of the processing time left for the database.

Implementation of the principle of

Compared to NoSQL databases, MySQL is a delicate creature. It is like the female students in PE class, a little dispute and classmates will be uncomfortable (difficult to expand), run two steps will be out of breath (small capacity and low), often sick to ask for leave (SQL constraints too much). Everyone is a bit distributed these days, and application scaling is much easier than database scaling, so the principle is less database work, more application work.

  • Make the most of indexes, but don’t abuse them. Indexes also consume disk and CPU.
  • It is not recommended to use database functions to format data for application processing.
  • The use of foreign key constraints is not recommended; data accuracy is guaranteed by the application.
  • Unique indexes are not recommended in the scenario where many data is written but few data is read.
  • With appropriate redundancy of fields, try creating intermediate tables, calculating intermediate results with applications, and trading space for time.
  • Extremely time-consuming transactions are not allowed to be executed and the application is split into smaller transactions.
  • Anticipate load and data growth trends for important data sheets (such as order sheets) and optimize them in advance.

Data table design

The data type

Choice of data type: simpler or less space.

  • If the length is sufficient, use tinyint, Smallint, medium_int instead of int.
  • If the length of the string is specified, use the char type.
  • If the vARCHar meets this requirement, the text type is not used.
  • For high precision, use decimal or BIGINT, such as multiplying two decimal places by 100 and saving.

Use timestamp instead of datetime whenever possible.

Timestamp takes up less space than datetime and is stored in THE AUTOMATIC time zone conversion format UTC.

Avoid empty value

MySQL still occupies space when a field is NULL, which makes indexes and index statistics more complicated. Updates from NULL values to non-NULL values cannot be performed in situ, which is prone to index splitting and affects performance. If possible, replace the NULL value with a meaningful value to avoid the SQL statement containing is not NULL.

Text type optimization

Because the text field stores a large amount of data, the table size can increase very early, affecting the query performance of other fields. It is recommended to extract it and place it in a sub-table and associate it with the business primary key.

The index optimization

The index classification

  1. Plain index: The most basic index.
  2. Composite index: An index created on multiple fields to speed up the retrieval of composite query conditions.
  3. Unique index: Similar to a normal index, but the value of the index column must be unique, allowing empty values.
  4. Combination unique index: The combination of column values must be unique.
  5. Primary key index: a special unique index that uniquely identifies a record in a data table. Empty values are not allowed and are usually constrained by the primary key.
  6. Full text index: Used to query massive text. InnoDB and MyISAM support full text index after MySQL5.6. Due to poor query accuracy and scalability, more enterprises choose Elasticsearch.

The index optimization

  1. Paging queries are important, MYSQL does not use indexes if the volume of the query exceeds 30%.
  2. The number of indexes in a single table does not exceed five, and the number of index fields does not exceed five.
  3. The string can be indexed by prefix, and the prefix length is limited to 5-8 characters.
  4. The uniqueness of the field is too low, so it is meaningless to add indexes, such as whether to delete, gender.
  5. Fair use coverage index, as follows:

select loginname, nickname from member where login_name = ? Creating a combined index for the loginname and nickname fields is faster than creating a simple index for login_name

SQL optimization

batch

When the blogger was young, he saw a fish pond with a small opening to release water, and there were all kinds of floating objects on the surface. Duckweed and leaves always make it through the outlet, while branches block other objects and sometimes get stuck and need to be cleaned manually. MySQL is the fish pond, maximum concurrency and network bandwidth is the outlet, user SQL is floating. Queries without paging parameters or update and DELETE operations that affect a large amount of data are branches that need to be broken up and processed in batches. For example: Update status=0 FROMcoupon WHERE expire_date <= #{currentDate} and status=1; If a large number of coupons need to be updated to the unavailable state, executing this SQL may block other SQL, batch processing pseudocode as follows:

The <> operator optimizes

In general, the <> operator cannot use an index, as in the following example: select id from orders where amount! = 100; If the number of orders for 100 is very small and the data distribution is very uneven, it is possible to use indexes. In view of this uncertainty, union is adopted to aggregate the search results, and the rewriting method is as follows:

The OR optimization

Or cannot use composite indexes under Innodb engine, for example:

OR could not match the combined index of mobileno + userID.

When both id and product_name fields are indexed, the query is most efficient.

IN optimization

  1. IN is suitable for the main table big small table, EXIST is suitable for the main table small table big table. Due to the continuous upgrading of the query optimizer, there are many scenarios where the performance is almost the same.
  2. Try changing to join query as follows:

select id from orders where user_id in (select id from user where level = ‘VIP’);

JOIN is adopted as follows:

I don’t do column operations

Index invalidation is usually performed in the query condition column, as shown below: Query the order of the day

The date_format function causes this query to be unable to use the index.

Avoid to Select all

If you do not query all columns in the table, avoid using SELECT *, which does a full table scan and does not make efficient use of the index.

Like to optimize

Like is used for fuzzy queries, for example (field is indexed) :

The query did not match the index.

Removing the previous % query will hit the index, but does the product manager have to blur the match before and after? Fulltext can try, but Elasticsearch is the ultimate weapon.

The Join optimization

Join is implemented by the Nested Loop join algorithm, which drives the result set of the table as the basic data, uses the junction data as the filtering condition to query the data in the next table, and then merges the results. If there are multiple joins, the previous result set is used as circular data and the data is queried again in the latter table.

  1. The driven and driven tables should increase the query conditions as much as possible, satisfy the condition of ON and use less Where, and use small result sets to drive large result sets.
  2. If an index is added to the join field of the driven table, set an adequate join Buffer Size if the index cannot be established.
  3. Do not join more than three tables and try to add redundant fields.

Limit optimization

Limit is used for paging query. The performance deteriorates if you scroll backwards. The solution is as follows:

Select ID first to narrow the query scope, written as follows:

If only the primary key ID is displayed, the following formula is used:

What if the above solution is still slow? If you are interested in JDBC, you can use a cursor to implement paging queries

Other databases

As a back-end developer, it is important to be proficient in MySQL or SQL Server as the storage core, and also to take an active interest in NoSQL databases, which are mature enough and widely adopted to solve performance bottlenecks in specific scenarios.