The background,

Recently, the company added new functions to the project, but it took a long time to query the list of some functions after the launch. The reason is that the new function uses the interface of the old function, and the SQL query statements of the old interface are associated with 5 or 6 tables and the writing is not standardized enough, resulting in index failure and full table scan when MySQL executes SQL statements. The colleague who was originally in charge of optimization went home on leave, so the problem of optimizing query data fell in the hands of the author. The author has successfully solved the problem after consulting the online SQL optimization information. In this paper, we record and summarize the MySQL query optimization related skills from the global perspective.

Second, optimization ideas

Slow data query does not mean there is a problem with SQL statement writing. First, we need to find the source of the problem before we can “treat it”. The author uses a flow chart to show the idea of MySQL optimization:

Without further words, it is clear from the figure that there are many reasons for slow data query, such as cache failure, MySQL server crash due to high concurrent access during this period; SQL statement writing problems; MySQL server parameter error; Hardware configuration limitations MySQL service performance issues.

Check the status of the MySQL server

If the number of concurrent requests is not high and the query speed is slow, you can skip this step and directly perform SQL statement tuning.

Execute command:

show status

Results are not posted here because too many results are returned. In the returned results, we focus on the values of ‘Queries’,’ Threads_connected ‘, and ‘Threads_running’, that is, the number of Queries, threads connected, and threads running.

We can monitor the running status of the MySQL server by executing the following script

#! /bin/bash

while true

do

Mysqladmin uroot – p “password” ext | awk ‘/ Queries / 4 = ${q} / Threads_connected/c = ${4} / Threads_running / {r = $4} END {printf (” % d % d %d”,q,c,r)}’ >> status.txt

sleep 1

done

Run the script for 24 hours to obtain the contents in status. TXT and use awK to calculate the number of MySQL service requests per second

awk ‘{q=$1-last; last=$1}{printf(“%d %d %d”,q,$2,$3)}’ status.txt

Copy the calculated content into Excel to generate charts and observe the periodicity of the data.

If the observed data changes periodically, as explained in the figure above, the cache invalidation policy needs to be modified.

Such as:

One of the values in the interval [3,6,9] is obtained by random number as the cache expiration time, which disperses the cache expiration time and saves part of memory consumption.

During peak access times, some requests are diverted to the unexpired cache and some access to the MySQL database, which reduces the strain on the MySQL server.

4. Obtain the SQL statement to be optimized

4.1 Method 1: View running threads

Execute command:

show processlist

Return result:

mysql> show processlist;

+—-+——+———–+——+———+——+———-+——————+

| Id | User | Host | db | Command | Time | State | Info |

+—-+——+———–+——+———+——+———-+——————+

| 9 | root | localhost | test | Query | 0 | starting | show processlist |

+—-+——+———–+——+———+——+———-+——————+

1 row in set (0.00 sec)

From the result, we can know what command /SQL statement the thread executed and when. In practice, N records are returned in the query result.

Where, the returned value of State is the key for us to judge the performance. If its value appears as follows, the SQL statement recorded in this row needs to be optimized:

Converting HEAP to MyISAM # When the result of the query is too large, put the result to disk, serious

Create TMP table # Create temporary table. Critical

Copying to TMP tables on disk # Critical

Locked # is locked by another query, serious

Loggin slow query # Record slow queries

Sorting result

The State field has many values, but for more information, see the link at the end of this article.

4.2 Method 2: Enable Slow Log Query

Add two parameters below the [mysqld] line in the my.cnf configuration file:

slow_query_log = 1

slow_query_log_file=/var/lib/mysql/slow-query.log

long_query_time = 2

log_queries_not_using_indexes = 1

Slow_query_log = 1 indicates that slow query is enabled.

Slow_query_log_file specifies the location where slow query logs are stored.

Long_query_time =2 Indicates that the log is generated after 2 seconds.

Log_queries_not_using_indexes = 1 Records SQL statements that do not use indexes.

Note: The slow_query_log_file path cannot be written arbitrarily, otherwise the MySQL server may not have permission to write the log files to the specified directory. You are advised to copy the preceding path.

After modifying the saved file, restart the MySQL service. The slow-query.log file is created in the /var/lib/mysql.directory. Connect to the MySQL server. Run the following command to view the configuration.

show variables like ‘slow_query%’;

show variables like ‘long_query_time’;

Testing slow query logs:

mysql> select sleep(2);

+———-+

| sleep(2) |

+———-+

| | 0

+———-+

1 row in set (2.00 sec)

Open the slow query log file

[root@localhost mysql]# vim /var/lib/mysql/slow-query.log

/usr/sbin/mysqld, Version: 5.7.19-log (MySQL Community Server (GPL)). started with:

Tcp port: 0 Unix socket: /var/lib/mysql/mysql.sock

Time Id Command Argument

# Time: 2017-10-05 T04: it happened. 408964 z

# User@Host: root[root] @ localhost [] Id: 3

# Query_time: 2.001395 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 0

use test;

SET timestamp=1507178351;

select sleep(2);

We can see that the SQL statement that just executed for 2 seconds is recorded.

Slow query logs record SQL information about slow query, but the logs are dense and difficult to query. Therefore, we need tools to filter out the SQL.

MySQL provides the mysqlDumpslow tool to analyze logs. You can use mysqlDumpslow –help to see how commands are used.

Common parameters are as follows:

-s: indicates the sorting mode, followed by the following parameters

C: Access times

L: Lock time

R: Returns a record

T: Query time

Al: Average lock time

Ar: Average return record book

At: indicates the average query time

-t: Returns the previous number of entries

-g: Searches for a regular expression, which is case-insensitive

Case study:

Gets the 10 SQL that return the most recordsets

mysqldumpslow -s r -t 10 /var/lib/mysql/slow-query.log

Get the 10 most frequently accessed SQL

mysqldumpslow -s c -t 10 /var/lib/mysql/slow-query.log

Gets the first 10 queries by time that contain left joins

mysqldumpslow -s t -t 10 -g “left join” /var/lib/mysql/slow-query.log

To recommend a wechat: KX33389, plus she can receive free distributed, micro services, source code analysis, performance optimization, high concurrency and high availability technology information

Analyze SQL statements

5.1 Method 1: Explain

Filter out problematic SQL, we can use the explain provided by MySQL to check the SQL execution plan (associated tables, table query order, index usage, etc.).

Usage:

explain select * from category;

Return result:

mysql> explain select * from category;

+—-+————-+———-+————+——+—————+——+———+——+——+———-+——-+

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |

+—-+————-+———-+————+——+—————+——+———+——+——+———-+——-+

| | 1 SIMPLE | category | NULL | | NULL ALL | NULL | NULL | NULL | 1 | | NULL | 100.00

+—-+————-+———-+————+——+—————+——+———+——+——+———-+——-+

1 row in set, 1 warning (0.00 sec)

Field Description:

1. Id: select query serial number. The ids are the same and the execution sequence is from top to bottom. Different ids. A larger ID has a higher priority and is executed earlier

2. Select_type: specifies the operation type for querying data. The value is as follows:

Simple: simple query, without subquery or union

Primary: contains complex subqueries marked with this value for the outermost query

Subquery: Contains subqueries in select or WHERE, marked with this value

Derived: Subqueries contained in the FROM list are labeled with this value, and MySQL executes these subqueries recursively, putting the results in temporary tables

Union: If the second select appears after the union, it is marked with this value. If union is contained in a subquery from, the outer select is marked as derived

Union result: Select the result from the union table

3. Table: Displays which table the row is about

4. Partitions: match partitions

5. Type: Indicates the join type of the table. Its values are in the following order from highest to lowest:

System: A table has only one row, which is equivalent to a system table

Const: Found by index at once, matching only one row of data

Eq_ref: Unique index scan, for each index key, only one record in the table matches it. Often used for primary key or unique index scans

Ref: a non-unique index scan that returns all rows matching a single value. Indexed columns for the =, <, or > operators

Range: Retrieves only rows in a given range, using an index to select rows. Generally, between, >, < are used

Index: Traverses only the index tree

ALL: indicates full table scan, with the worst performance

Note: The first five cases are ideal index usage cases. It is usually optimized at least to the range level and preferably to the REF level

Possible_keys: Specifies which index MySQL uses to find row records in this table. If the value is NULL, it indicates that no index is used. You can create an index to improve performance

7. Key: displays the actual index used by MySQL. If NULL, no index query is used

8. Key_len: Indicates the number of bytes used in the index, which is used to calculate the length of the index used in the query. Without sacrificing accuracy, the shorter the length, the better it is to display the maximum length of the index field, not the actual length used

9. Ref: Displays which table fields are associated with the index fields of the table

10. Rows: Approximate the number of rows to find or read based on table statistics and selection, the smaller the better

11. Filtered: percentage of the number of rows returned as a result to the number of read rows. The greater the value, the better

12. Extra: Contains important additional information that is not suitable for display in other columns. Common values are as follows:

Using filesort: indicates that MySQL uses an external index sort for data, instead of reading data in the order of the indexes in the table. When this value is present, the SQL should be optimized

Using temporary: A temporary table is used to hold intermediate results. MySQL uses temporary tables when sorting query results. Common in sort order by and group by queries. When this value is present, the SQL should be optimized

Using index: Indicates that the corresponding SELECT operation uses an overwrite index to avoid accessing rows of the table

Using WHERE: The WHERE clause is used to restrict which line

Using join buffer: Uses the join buffer

Distinct: Stops searching for more rows in the current row combination after the first match is found

Note: The SQL statement must be optimized for the first two values.

5.2 Profiling 2

You can use the profiling command to learn more about the resources consumed by SQL statements (the overhead per execution step).

5.2.1 Checking whether a Profile is enabled

select @@profiling;

Return result:

mysql> select @@profiling;

+————-+

| @@profiling |

+————-+

| | 0

+————-+

1 row in set, 1 warning (0.00 sec)

0 indicates the disabled state, and 1 indicates the enabled state

5.2.2 enable profile

set profiling = 1;

Return result:

mysql> set profiling = 1;

Query OK, 0 rows affected (0.00 SEC)

mysql> select @@profiling;

+————-+

| @@profiling |

+————-+

| 1 |

+————-+

1 row in set, 1 warning (0.00 sec)

After the connection is closed, the Profiling state is automatically set to closed.

5.2.3 Viewing the SQL List

show profiles;

Return result:

mysql> show profiles;

+———-+————+——————————+

| Query_ID | Duration | Query |

+———-+————+——————————+

0.00062925 | | 1 | select @ @ profiling |

| 2 | 0.00094150 | show tables |

0.00119125 | | 3 | show databases |

| | | | 0.00029750 4 SELECT the DATABASE ()

| | | 0.00025975 5 show the databases |

| 6 | 0.00023050 | show tables |

| 7 | 0.00042000 | show tables |

| 8 | 0.00260675 | desc role |

| | | the select name, 0.00074900 9 is_key from role |

+———-+————+——————————+

9 rows in set, 1 warning (0.00 sec)

Before running this command, you need to execute other SQL statements.

5.2.4 Querying execution information about the specified ID

show profile for query Query_ID;

Return result:

mysql> show profile for query 9;

+———————-+———-+

| Status | Duration |

+———————-+———-+

0.000207 | | starting |

0.000010 | | checking permissions |

| Opening tables | 0.000042 |

0.000050 | | init |

0.000012 | | System lock |

0.000003 | | optimizing |

0.000011 | | statistics |

0.000011 | | preparing |

0.000002 | | executing |

0.000362 | | Sending data |

0.000006 | | | end

0.000006 | | | query end

| closing tables | 0.000006 |

0.000011 | | freeing the items |

| cleaning up | | 0.000013

+———————-+———-+

15 rows in set, 1 warning (0.00 sec)

Each row is a process of state changes and how long they last. The Status column is consistent with the State of the Show ProcessList column. Therefore, the points that need to be optimized are the same as those described above.

The value of the Status field can also be found in the link at the end.

5.2.5 Obtaining CPU and Block I/O Information

show profile block io,cpu for query Query_ID;

show profile cpu,block io,memory,swaps,context switches,source for query Query_ID;

show profile all for query Query_ID;

Sixth, optimize means

It focuses on query optimization, index usage and table structure design.

6.1 Query Optimization

Avoid SELECT * and query the corresponding field for any data you need.

Small tables drive large tables, that is, small data sets drive large data sets. For example, take tables A and B as an example. The two tables are associated by the ID field.

When the data set of table B is smaller than that of table A, in optimization exist is used. If in is used, the execution sequence of the two tables is to check table B first and then table A

select * from A where id in (select id from B)

When the data set of table A is smaller than that of table B, exist is used to optimize in. If exists is used, the execution sequence of the two tables is to check table A and then table B

select * from A where exists (select 1 from B where B.id = A.id)

In some cases, you can use joins instead of subqueries because MySQL does not create temporary tables in memory with joins.

Add redundant fields to reduce table association.

Reasonable use of indexes (described below). For example, create indexes for sorted and grouped fields and avoid filesort.

6.2 Index Usage

6.2.1 Suitable for Index scenarios

The primary key automatically creates a unique index

Fields frequently used as query criteria

Fields associated with other tables in a query

The sorted fields in the query

Statistics or grouping fields in the query

6.2.2 Indexes are Not Suitable for Scenarios

Fields that are frequently updated

Fields not used in the WHERE condition

Too few table records

A watch that is often added, deleted, or modified

The values of the fields are not very variable or repetitive

6.2.3 Principles for Creating and Using Indexes

Single-table query: Create an index on the column for which the query criteria are used

Multi-table query: left JOIN, add index to right table associated field; When right JOIN, the index is added to the left table associated field

Do not perform any operations on the indexed columns (calculations, functions, casts)

Do not use it in index columns! =, <> is not equal

Index columns should not be empty, and do not use is NULL or IS not NULL

The index field is a string and the value of the query condition is quoted to avoid automatic conversion of the underlying type

Violating the preceding rules may result in index failure. For details, run the Explain command

6.2.4 Index Failure

In addition to violating the principles of index creation and use, index invalidation can occur when:

For fuzzy queries, start with %

If OR is used, for example, field 1 (non-index) or field 2 (index) invalidates the index.

When a composite index is used, the first index column is not used.

Index (a,b,c), using fields A,b,c as composite indexes:

Where a = 1 and b = 2 where a = 1 and b = 2 and c = 3 where a = 1 and b = 2 and c = 3 Where b = 2 or C = 3 No where a = 1 and C = 3 Where a = 1 and C = 3 Where a = 1 and b like ‘XXX %’ and c = 3 where a = 1 and B like’ XXX %’ and C = 3

6.3 Database table structure design

6.3.1 Selecting an appropriate data type

Use the data type that can hold the smallest amount of data

Use simple data types. Int is easier to handle in mysql than vARCHar

Use tinyInt, SmallInt, mediumInt as integer types instead of int

Define fields using not NULL whenever possible, since NULL takes up 4 bytes of space

Use text as little as possible, and consider tables when you must

Use timestamp instead of datetime whenever possible

Do not have too many fields in a single table, recommend less than 20

6.3.2 Table splitting

When the data in the database is very large and the query optimization scheme cannot solve the problem of slow query speed, we can consider splitting the table to make the amount of data in each table smaller, so as to improve the query efficiency.

1. Vertical split: Separate multiple columns in a table into different tables. For example, if some fields in the user table are frequently accessed, put these fields in one table and some less commonly used fields in another table. When inserting data, use transactions to ensure data consistency between the two tables.

2. Horizontal split: Split by row. For example, in the user table, the user ID is used, the remainder of the user ID is 10, and the user data is evenly distributed to the 10 user tables ranging from 0 to 9. Data is also queried according to this rule during lookups.

6.3.3 Read/write Separation

The general rule for databases is “read more than write less”. In other words, database stress is mostly caused by a large number of data reading operations. We can use the database cluster scheme, using a library as the master library, responsible for writing data; The other libraries are slave libraries that read data. This relieves database access pressure.

Seven, server parameters tuning

7.1 Memory Related

Sort_buffer_size Specifies the memory size of the sort buffer

Join_buffer_size Specifies the size of the connection buffer

Read_buffer_size Buffer size allocated during full table scan

7.2 the IO related

Innodb_log_file_size Transaction log size

Innodb_log_files_in_group Number of transaction logs

Innodb_log_buffer_size Transaction log buffer size

Innodb_flush_log_at_trx_commit Transaction log refresh policy. The value is as follows:

0: Writes logs to the cache and flush logs to the disk every second

1: Writes log to cache and flush log to disk on each transaction commit

2: For each transaction commit, log data is written to cache and flush log is flushed to disk every second

7.3 Security Related

Expire_logs_days Days for automatically clearing binlogs

Max_allowed_packet controls the size of packets that MySQL can receive

Skip_name_resolve disables DNS lookup

Read_only Disables the write permission of non-super users

Skip_slave_start you automatically restore using slave

7.4 other # # #

Max_connections Controls the maximum number of connections allowed

Tmp_table_size Temporary table size

Max_heap_table_size Maximum memory table size

The author does not use these parameters to tune the MySQL server. For details and performance effects, please refer to the materials at the end of this article or separate baidu.

Hardware selection and parameter optimization

Hardware performance directly determines the performance bottleneck of MySQL database, and directly determines the running data and efficiency of MySQL database.

As software development programmers, we mainly focus on software optimization. The following hardware optimization can be used as an understanding

8.1 Memory Related

The I/O speed of the memory is much faster than that of the hard disk. Therefore, the buffer capacity of the system can be increased so that the data stays in the memory for a longer time, reducing the I/O speed of the disk

8.2 Disk I/O Related

Use SSD or PCle SSD devices to improve IOPS by hundreds or even tens of thousands of times

An array card is equipped with a CACHE and BBU module, which greatly improves IOPS

Use RAID-10 instead of RAID-5 whenever possible

8.3 Configuration related to the CPU

In the BIOS Settings of the server, adjust the following Settings:

Select the Performance Per Watt Optimized (DAPC) mode to maximize CPU Performance

Disable options such as C1E and C States to improve CPU efficiency

Memory Frequency Select Maximum Performance