Men are slow, SQL is fast: Remember a slow SQL optimization

The problem

This is an online problem. The execution time of the SQL queried from the log platform is 11.146s, which can be considered as a slow query. The beautified SQL is as follows:

Select * from table where index = index;

As you can see, there are two main combined indexes: STATUS, TO_ACCOUNT_ID and STATUS, from_account_ID

Problem analysis

We first look at the execution plan with explain:

Let’s take a look at what explain means.

Id: Nothing is just an ID, usually one row if there are no subqueries.

Select_type: divides into simple queries and complex queries roughly. Complex queries are divided into simple subqueries, derived tables (subqueries in from), and unions. In general, we see that simple is more, which means no subqueries or unions, and that complex queries are marked as primary.

Table: table name

Type: represents the association type that determines how Mysql looks for row data. This is usually the key information point when we look at queries. For example, ALL is a full table scan; Index means to use the index. Range represents a limited scan index, which is better than scanning all indexes directly; Ref is also an index lookup that returns rows that match a particular value. There are other types of this. For example, eq_ref returns only one record that matches, and const is optimized to be converted to constant.

Possible_keys: Shows an index that can be used, but not necessarily.

Key: indicates the index in use.

Key_len: Number of bytes used by the index.

Ref: represents the column or constant value used by the index lookup in the key column above.

Rows: The number of rows read to find the data that matches the condition.

Rows * FILTERED: indicates the percentage of rows in the table that match the query criteria. Rows * FILTERED can approximate the number of associated rows, and the fields added after Mysql5.1.

Extra: Extra information, such as using index to use an overwritten index, using WHERE to filter after the storage engine, using Temporary to use temporary tables, and using filesort to externally sort results.

Based on the above experience, we see that the index and the number of rows scanned are actually fine, but we find that using filesort is used in the execution plan.

ORDER BY amount DESC; create_time ASC; ORDER BY amount DESC; This reduces the query performance of SQL.

Let’s also understand how order BY works, which will help us better optimize SQL.

In Mysql, if the data volume is small, it will be sorted in memory. If the data volume is large, it will be sorted in disk. This process is called filesort.

First, the data is found according to the index, and then the data is put into the sort buffer
If the actual size of the data to be sorted does not exceed the buffer size, a memory sort, such as quicksort, is used, and the data that meets the criteria is retrieved and returned
If the buffer size is exceeded, an external sort is required. The algorithm usually uses a multi-way merge sort, which first divides the data into blocks, then sorts each block, saves the sorted results to disk, and finally merges the sorted results

In addition to knowing the sorting process, sorting uses the defined maximum length of the field rather than the actual stored length, so it takes more space.

In addition, in the version before 5.6, if multiple tables are involved in associative query, and the sorted fields are from different tables, the associated results are saved to the temporary table. This is why we often see using TEMPORARY. If you use limit using filesort, the limit will occur after the order, which may result in a very large amount of data being ordered.

Overall, the size of the buffer, the data length of the sort field, and the number of queried data items all affect query performance.

In this paper, we analyze the whole sorting process, and guide the optimization idea is to try not to use filesort, especially in the sorting of large amount of data, so the optimization method is to try to make the query data is already sorted, that is, reasonable use of union index and overwrite index.

To optimize the direction

Optimization 1: Adjust the index structure

Optimization 2: Code structure optimization

In addition, we found a code that performs operations in the for loop and then updates the state of the DB table, which will result in 1500 DB updates. We can consider batch processing of DB updates to reduce the number of DB writes. For example, perform a DB update for 100 records. This greatly reduces the number of writes to DB.

So this will reduce the number of writes from 3,000 to 30 per method call, but the batch size can be adjusted.

We’ll just focus on SQL tuning here, leaving the code aside for now.

Performance results

The amount of data in the test environment is 300,000

The query duration before optimization is more than 1.5 seconds
The optimized query is about 0.4s

Query performance is improved by 3 to 4 times.

According to the query from the production database, the data volume is about 3KW+, and the data conforming to the where condition is about 3 million

The query time ranges from 11s to 14s before optimization
After optimization, the query is about 0.8s

Performance is improved by more than 10 times.

Although this optimization is relatively simple, we still need to have a solid foundation to choose the most reasonable way to optimize.

Men are slow, SQL is fast: Remember a slow SQL optimization

The problem

Problem analysis

To optimize the direction

Performance results

Related Posts

How to reverse unidirectional and bidirectional lists (Space complexity O(1))

Do you really know how to input/output C files

Interviewer: Can you tell us why database indexes use B trees and B+ trees?