This is the fourth day of my participation in the August More text Challenge. For details, see:August is more challenging

Anprecedent: One day, some users reported that the speed of the company’s app was very slow. After opening and using the app myself, I found that the response of other functional interfaces was ok, but the message query interface was very slow, and most of the time the results could not be returned at all. After finding the corresponding SQL replacement parameters, the SQL is directly executed on the client, but it takes a long time to return (the related table data is also large, which should be caused by the accumulation of table data). The SQL definitely needs to be reoptimized, so in this case you are prepared to do a slow query analysis and then uniformly optimize the problematic SQL.

Please send out the assistant pt-Query-Digest

Pt-query-digest is a tool of the Percona Toolkit and is one of the most used. Use this command to analyze slow logs, binary logs, and general logs.

This paper uses PT-Query-Digest to analyze slow SQL and optimize SQL

First, install Percona-Toolkit

Download: www.percona.com/doc/percona…

Install it in the /usr/local/percona-toolkit directory. Run the following command:

Wget percona.com/get/percona-toolkit.tar.gz tar ZXF percona-toolkit.tar.gz CD percona- Toolkit -3.3.1/ perl makefile.pl PREFIX=/usr/local/percona-toolkit make && make installCopy the code
1. Enable the slow query log function and set the slow query time
mysql> show variables like '%slow_query_log%';
+---------------------------+-------------------------------+
| Variable_name             | Value                         |
+---------------------------+-------------------------------+
| slow_query_log           | ON                           |
| slow_query_log_file       | /var/lib/mysql/mysql-slow.log |
+---------------------------+-------------------------------+
Copy the code
Mysql > set global long_query_time=1;Copy the code

After the slow query is enabled, after the program runs for a period of time, it can be three days, it can be a week, depending on the business situation, and then a few sentences can get the record log

2. Run commands to analyze the slow query logs

Directly to the mysql – missile. The log look disorganized, is not easy to analysis, this time with before install percona toolkit – tools, use the command analysis and generate regular SQL log report slow_report. The log, the command is as follows:

 pt-query-digest slow.log > slow_report.log
Copy the code

Then we can export the file and see what’s in it, and we’ll look at the file in detail.

Second, analyze the slow query logs

The file contains three sections: overall statistics, SQL statistics, and individual SQL statistics

1. Overall statistics
# Current date: Mon May 10 11:47:39 2021 # Hostname: xxxx.xxx-xxx.com # Files: /var/lib/mysql/mysql-slow.log # Overall: 87.23k total, 73 unique, 0.05qps, 0.37x Concurrency __________ # Time range: Attribute Total min Max AVg 95% stddev median #. Attribute Total min Max AVg 95% stddev median # ============ ======= ======= ======= ======= ======= ======= ======= # Exec time 676605s 2s 905s 8s 23s 12s 4s # Lock Time 242s 0 1s 3ms 4ms 17ms 839us # Rows sent 7.90m 0 15.53k 107.02 400.73 616.06 8.91 # Rows sent 7.13g 0 8.94m 85.77k 46.68K 412.27k 2.38k # Query size 161.99m 6 24.91k 1.90k 1.96K 682.671.96kCopy the code
2. SQL statistics summary
# Profile # Rank Query ID Response time Calls R/Call # ==== =================================== ================ ===== = = = = = = = # 1 0 x8ebd7078f62a82a7c578540c76f46bc4 602766.9262 8... 75091 8.0272 13.94 SELECT XXXX # 2 0 x40a63f5c50a2324033db9fcaa2719c4e 18044.3571 2.7% 4131 4.3680 3.07 SELECT XXXX # 3 0 xfb8f32ae0efaa83c665b91b6e5862d2f 16215.4058 2.4% 2335 6.9445 6.22 SELECT XXXX x2cf3802fa98afce8da5c85f6e8424dce # 4 0 12951.3375 2390 5.4190 6.56 SELECT XXXX # 5 0 1.9% 1.3% x56a24ec2ec1ffdb2f49a123c34d5e0bd 8612.3662 479 17.9799 31... The SELECT XXXX # 6 0 x6d73aba4d5097101273aa5adb2259759 8328.1423 1.2% 858 9.7065 12.72 SELECT XXXX # 7 0 x75a04b6ca2cbde5eb7a27a7fc15ffcc1 3864.3549 615 6.2835 5.72 SELECT XXXX # 8 0 0.6% x886f3b1a59bd9900a6688314b0a3e4e0 3050.7563 0.5% of 614 4.9687 2.93 SELECT XXXX # 9 171 3.9809 1.06 SELECT zero xe6aa1c4fe828263924b7c26f5160bd60 680.7256 0.1% xxxx # 10 .............Copy the code
Query ID: indicates the statement ID. Query ID: indicates the statement ID. Response Time: indicates the total Response time and the percentage of the Query time in the analysis Calls: Indicates the number of times that the Query is executed. R/Call: indicates the average Response time of each executionCopy the code
3. Statistics of a single SQL
# Query 1: 0.04 QPS, 0.33 x concurrency, ID 0x8EBD7078F62A82A7C578540C76F46BC4 at byte 66396962 # This item is included in the report because it matches --limit. # Scores: V/M = 13.94 2021-04-16T03:15:31 to 2021-05-07T08:38:15 # Attribute pct total min max avg 95% stddev median # ============ === ======= ======= ======= ======= ======= ======= ======= # Count 86 75091 # Exec time 89 602767s 2s 281s 8s 23s 11s 4s # Lock time 64 156s 352us 730ms 2ms 4ms 10ms 839us # Rows sent 6 554.55K 0 31 7.56 16.81 5.75 5.75 # Rows sent 4 294.12m 110 77.33k 4.01k 10.29k 5.80k 2.38k # Query size 89 145.64m 1.98k 1.99k 1.99k 1.96K 0.00 1.96k # Tables # # EXPLAIN /*! SELECT hg.group_id, MAX(ham.app_message_id) latest_message by PARTITIONS*/ # COALESCE ( hgrf.last_read_message_id, 0 ) last_read_message_id, SUM( CASE WHEN app_message_id > COALESCE ( last_read_message_id, 0 ) AND ham.receiver_type = 'USER' THEN 1 ELSE 0 END ) unread_message_count FROM h_group hg INNER JOIN h_message hm ON hm.group_id = hg.group_id INNER JOIN h_app_message ham ON ham.message_id = hm.message_id AND ham.user_id = 2084 LEFT JOIN h_group_read_flag hgrf ON hg.group_id = hgrf.group_id AND hgrf.user_id = ham.user_id AND hgrf.user_type = 0 WHERE ham.deleted = 0 AND hm.send_flag = 1 GROUP BY hg.group_id, hgrf.last_read_message_idCopy the code

From the above logs, we can see which SQL executions are slow and which SQL executions are more frequent. Then, we analyze the SQL with problems and optimize it according to different business requirements. For example:

For SQL that executes slowly:

  • Use the Explain query execution plan to verify that the index is in use; Refer to the explain command for details
  • If there are too many associated tables, can you use redundant fields to reduce unnecessary table associations?
  • The query conditions are complex and the amount of data to be queried is large. Can you query multiple times at the business layer (sometimes multiple times are no slower than one time)?
  • Analyze services, whether to query redundant fields or data;
  • If the amount of data in a single table is too large, consider data archiving, read and write separation, and divide the database into tables;

For SQL with high execution frequency:

  • Data attempt to use cache, less database queries.

Third, the actual case analysis

For example, the SQL above (reduced) is querying the user not read

SQL analysis: Execution times 75091, total time spent 602767s, the average single time is 8s, the maximum time is 281s, the minimum is also 2s

A. Optimize SQL to improve SQL efficiency

For a long time, the index is in normal use, always around 2s, and if a user has a large number of unreads, it can take longer

B. Disassemble THE SQL according to the service logic to reduce the data amount and table association

Scenario description and Analysis:

App_message stores the association between users and messages (count= number of messages x number of users).

Total 4 tables:

App_message (user message association table, main fields: app_message_ID,message_id,user_id). The data quantity is 10 million

Message (message table, main fields: message_id, group_id) Data volume is nearly 2 million

H_group (channel table, main field: group_id) Each tweet has a channel to which it belongs, and the data amount is small

H_group_read_flag (user channel latest read table, main fields: Last_read_message_id, group_id,user_id) Stores the latest read messages of each channel (last_read_message_id=app_message_id), and the data amount is small

Each time a user opens the APP, the unread number and the latest messages of the user are queried through these four tables

According to the business solution analysis of related table:

1: H_group is only used to associate with H_group. You can delete the group_ID field in h_app_message from the H_group table.

2: The send_flag of h_message means that the retracting tweet is 0, otherwise it is 1 (without this association, the H_message table can be removed). After the retracting tweet, the related data in h_APP_message can be deleted, so that h_message can also be used.

3: Finally, the main H_APP_message table, the order of magnitude is large, and then the number is reduced (archiving the data with a longer span)

I also considered using sub-tables to reduce the amount of data in a single table and improve query efficiency, but I gave up considering that the number of users and application scenarios as well as the difficulty of large operation and development brought by increasing sub-tables.

If you have this convenient experience welcome to exchange