This paper mainly aims at small and medium-sized applications or websites, focusing on the optimization of SQL statements in daily program development. The so-called “big data” and “high concurrency” are only for small and medium-sized applications. Please ignore professional database operation and maintenance. The following practices are some coping strategies for relatively “big data” and relatively “high concurrency” scenarios in my actual development work. Some measures have not gone through strict comparison test and principle analysis. Any mistakes or omissions welcome all kinds of criticism and advice. Reduce the impact of the query result set and avoid full table scan. Impact result sets are at the heart of SQL optimization. The impact result set is not returned by the query

This paper mainly aims at small and medium-sized applications or websites, focusing on the optimization of SQL statements in daily program development. The so-called “big data” and “high concurrency” are only for small and medium-sized applications. Please ignore professional database operation and maintenance. The following practices are some coping strategies for relatively “big data” and relatively “high concurrency” scenarios in my actual development work. Some measures have not gone through strict comparison test and principle analysis. Any mistakes or omissions welcome all kinds of criticism and advice.


Reduce the impact of the query result set and avoid full table scan.

Impact result sets are at the heart of SQL optimization. The impact result set is not the number of records returned by the query, but the number of results scanned by the query. Analyze the SQL by Explain or Desc, and the values of the ROWS column are the impact result set (you can also get the numbers after Rows_examined in the slow query log).

Here are some of the SQL optimization strategies I often use:

Eliminate unnecessary queries and searches. In fact, in the practical application of the project, many query conditions are optional, can avoid from the source of redundant functions as far as possible cut, this is the simplest and crude solution.

Use indexes and compound indexes wisely. Index building is the most effective method in SQL optimization. Fields that are commonly used for finding, deleting, updating, and sorting can be appropriately indexed. Note, however, that a single query cannot use multiple indexes at the same time, only one index can be used. When there are multiple query conditions, you can use a composite index that combines multiple fields. Remember that when using compound indexes, the order of the fields in the query criteria must be the same as the order of the fields in the compound index.

Be careful to use conditions such as not in that may not allow indexes to be used. Indexes are not always useful when “not in”, “! =”, “like ‘%xx%'”, “is null”; When using these conditions, place them to the right of the conditions that effectively use the index. When designing table structures, I recommend using int instead of vARCHar as much as possible. Int parts can be replaced by greater than or less than! =” and other conditions, but also convenient to meet some needs of sorting by type, as for the readability of the problem, perfect database design document is a wise choice. It is recommended that all possible fields be set to “not NULL “and the default values be set to avoid “is NULL” judgment in WHERE sentences.

Do not perform functions, arithmetic operations, or other expression operations to the left of “=” in the WHERE clause, or the system will not use the index properly. Use MySQL functions as little as possible. Functions like Now() can be implemented and assigned programmatically, and some functions can be indirectly replaced by setting up redundant fields appropriately.

Using OR in the WHERE condition may result in invalid indexes. It can be replaced by “union all” or “union” (which filters duplicate data and is less efficient than the former), or the program can directly separate two times to obtain data and then merge to ensure efficient use of the index.

Do not use select *, not to improve query efficiency, mainly to reduce the amount of output data, improve transmission speed.

Avoid type conversions, which occur when the type of the field in the WHERE clause does not match the type of the parameter passed in.

Optimization of paging queries. In the case of a large number of pages, for example, LIMIT 1000010, the result set is 10010, and the query speed is slow. Select id from table where primary key (s); order by .. Limit 10000,10(search criteria and sort please index), and then get data by primary key.

Statistics related queries. The impact result set is often huge, and some SQL statements themselves are difficult to optimize. Therefore, you should avoid performing statistics related queries during peak business hours, or only performing statistics queries from the library. Some statistics can be stored in redundant data structures. It is recommended that the data be stored in memory and cache (such as REDis) before being written into the database according to certain policies.

Without using any linked table query, load balancing is achieved through sub-libraries and sub-tables.

As the amount of data increases, even table operations often result in a large increase in the impact of the result set, which cannot be solved from the level of SQL optimization.

In this case, database partitioning and table partitioning are the best choices to solve the pressure of database performance. (The specific database partitioning and table partitioning schemes are usually determined based on actual service application scenarios and are skipped here.) This paper focuses on how to better realize or transition to the distributed database architecture of sub-database and sub-table.

The core point is that the association between the tables must be removed first, that is, no foreign keys and no join table queries are used. In order to ensure no join operation, it is necessary to design moderately redundant fields when designing database table structure to achieve the purpose of no join operation.

For some operation logs and payment records, some fields are designed to record user information. I personally think it is not redundant, because user information is often changed. However, such a table similar to operation log does need to record user operation information, and does not need to be updated synchronously when users update information.

In practical development, in order to realize redundant fields without connecting tables, it is often necessary to update the data of redundant fields synchronously when the original table is updated. If the application layer does not properly encapsulate the data table operation, this is often a thorny problem and inconvenient to maintain.

Of course, the current mainstream application framework, generally using ORM way to handle data tables, so it is not a problem. Not even a table, on the other hand, in fact also can improve the efficiency of development, such as through a user ID get the user name, if you don’t even watch can ensure that each business module in the same way to get the user name, call the same wrapper method, in this way, can be very convenient unified in the application layer to join caching mechanism or add business logic.

At the same time, if you want to separate the user table, the application layer program can be simple and smooth.

Use the Innodb.

I won’t say much more about Innodb versus Myisam. Myisam table level locking isa fatal issue, considering MySQL already uses Innodb as the database engine by default, I recommend using Innodb directly in most cases, other engines are not discussed in detail here.

Use caching.

1) As far as possible in the program to achieve the cache of common data, the current mainstream application framework should be able to quickly achieve the cache requirements. Flush Query Cache Flush Query Cache Flush Query Cache Flush Query Cache Flush Query Cache

Server-specific optimization

You can customize MySQL service configuration and distributed architecture based on actual application scenarios and service requirements. It is not the focus of this article.


I hope the above content can help you. Many PHPer will encounter some problems and bottlenecks when they are advanced, and they have no sense of direction when writing too many business codes. I have sorted out some information, including but not limited to: Distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, Laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripting, Docker, microservices, Nginx, etc.