If an interviewer asks you: What dimensions would you use to optimize MySQL performance? What would you say?

The so-called performance optimization is generally for MySQL query optimization. Since we are optimizing the query, we naturally need to know what steps the query operation is going through, and then think about where we can optimize it.

I wrote earlier how does an SQL query execute? For those who are interested, I have used one of the diagrams to show the basic steps that a query operation goes through.

Here are some strategies for MySQL optimization from five perspectives.

1. Optimize the connection configuration

Handling the connection is the first step in the intimacy between MySQL client and MySQL server.

Since the connection is a two-way affair, we naturally optimize it from both the server and client sides.

1.1 Server Configuration

All the server needs to do is accept as many connections as possible. Perhaps you have encountered error 1040: Too many connections? It is the service side of the mind is not broad enough to cause the pattern is too small!

We can solve the problem of insufficient connections in two ways:

  1. Increase the number of available connections and modify environment variablesmax_connectionsBy default, the maximum number of connections on the server is151a
mysql> show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 151   |
+-----------------+-------+
1 row in set (0.01 sec)
Copy the code
  1. Release inactive connections in a timely manner. The default client timeout is 28800 seconds (8 hours). We can lower this value
mysql> show variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.01 sec)
Copy the code

MySQL has a lot of configuration parameters, and most of the parameters provide default values, the default value is carefully designed by the author of MySQL, can meet the requirements of most cases, do not recommend to change without knowing the meaning of parameters.

1.2 Client Optimization

The best thing the client can do is to minimize the number of connections to the server. Instead of creating a new connection every time a SQL statement is executed, the server and the client are straining their resources.

The solution is to reuse connections using connection pooling.

Common database connection pools are DBCP, C3P0, Ali’s Druid, and Hikari. The former two are rarely used, but the latter two are now in their prime.

For example, the default connection pool size for Druid is 8. For Hikari, the default connection pool size is 10. Why is that?

For each connection, the server creates a separate thread to handle it. The more connections there are, the more threads the server creates. However, when the number of threads exceeds the number of cpus, the CPU must allocate time slices for context switching of threads, and frequent context switching will cause great performance overhead.

Hikari provides a recommended formula for the size of a PostgreSQL database connection pool: CPU cores x 2+1. Assuming the server has a CPU core of 4, set the connection pool to 9. This formula is applicable to other databases to some extent, so you can play around with it during your interview.

2. Architecture optimization

2.1 Using Cache

It is inevitable that there will be some slow queries in the system. These queries are either large data volume or complex queries (associated tables or complex calculations), which occupy the connection for a long time.

If this kind of data is not very effective (it does not change from moment to moment, for example, daily report), we can put this kind of data into the cache system and obtain data directly from the cache system during the cache life of the data, which can reduce the pressure on the database and improve the query efficiency.

2.2 Read/Write Separation (Cluster, Primary/Secondary Replication)

At the beginning of the project, the database usually runs on a server, and all the user’s read and write requests will directly affect the database server. After all, the amount of concurrency undertaken by a single server is limited.

To solve this problem, we can use multiple database servers at the same time, and set one of them as the team leader, called the master node, and the other nodes as team members, called the slaves. Users write data only to the master node, and read requests are distributed among slave nodes. This scheme is called read-write separation. Give the group leader plus members a name, called the cluster.

Note: Many developers are unhappy with the term master-slave as being offensive (because they think it is associated with racism, black slaves, etc.) and have started a rename campaign.

Because of this, MySQL will gradually stop using terms such as master and slave, and use source and replica instead. You can understand it when you encounter it.

The problem with clustering is how to maintain data consistency between multiple nodes. After all, write requests are only sent to the master node. Only the data on the master node is the latest data. How can I synchronize write operations on the master node to all slave nodes?

Master slave replication is here! How do I execute an SQL update statement? I just moved over to binlog.

Binlog is the core component to implement the primary/secondary replication function of MySQL. The master node records all write operations to the binlog. The SLAVE node has a dedicated I/O thread to read the binlog of the master node and synchronize the write operations to the current slave node.

This cluster architecture to release the pressure of the primary database server has very good effect, but with more and more business data, if a particular form of the data volume increase sharply, single table query performance will drop sharply, and the separation cannot solve the problem is, speaking, reading and writing, after all, is exactly the same as the data of all nodes storing, single table query performance is poor, Naturally, all nodes perform poorly.

At this time, we can disperse the data of a single node to multiple nodes for storage, which is the sub-database sub-table.

2.3 Database and table

The meaning of the node in the sub-table of the sub-database is relatively broad. If the database is taken as the node, it is the sub-database. If a single table is used as a node, it is a sub-table.

We all know that sub – library sub – table is divided into vertical sub – table, vertical sub – table, horizontal sub – library and horizontal sub – table, but each time can not remember these concepts, I will give you a detailed say, to help you understand.

2.3.1 Vertical branch library

On the basis of the single database vertically cut several cuts, according to the business logic split into different databases, this is the vertical branch database.

2.3.2 Vertical table

Vertical table is on the basis of the single table vertical cut one knife (or several), will be a table of multiple words short split into several small tables, the need to judge according to the specific business operation, usually the frequently used field (hot fields) into a table, often do not use or not use immediate field (cold field) into a table, improve the query speed.

Take the above example: normally information details of goods are long, and view the list of goods tend not to need to immediately show goods details (typically click on details button will display), but the goods will be more important information (price, etc.) display, according to the business logic, we will make the original goods table vertical table.

2.3.3 Level table

The data of a single table is saved to multiple tables according to certain rules (called sharding rules in the industry), and the data table is cut horizontally (or several cuts).

2.3.4 Level classification

Horizontal sub-database is a horizontal cut to a single database, often accompanied by a horizontal sub-table.

2.3.5 summary

Horizontal points, mainly to solve the bottleneck of storage; Vertical separation is mainly to reduce concurrent pressure.

2.4 Message queue peak shaving

Typically, requests from users go directly to the database, which can be overwhelmed if there are too many users online at any one time (see weibo status when celebrities cheat or announce their relationships).

In this case, you can reduce the strain on the database by using message queues, which store requests, no matter how many user requests there are at the same time, and then methodically consume requests from the message queues.

3. Optimizer — SQL analysis and optimization

After all the joins, caching, and so on are done, SQL queries move into the domain of the parser and optimizer. If anything goes wrong at this stage, it is only the SQL statement.

As long as you don’t have a syntax problem, the parser will be fine. In addition, the optimizer does some automatic optimizations to prevent your SQL from running inefficiently, but if it sucks, the optimizer can’t save you and will only see your SQL queries degenerate into slow queries.

3.1 the slow query

A slow query is a query that executes slowly. Only when we know what slow queries are in MySQL can we optimize them accordingly.

The slow log query function is disabled by default because it costs performance. Run the following command to view the current slow log query status

mysql> show variables like 'slow_query%';
+---------------------+--------------------------------------+
| Variable_name       | Value                                |
+---------------------+--------------------------------------+
| slow_query_log      | OFF                                  |
| slow_query_log_file | /var/lib/mysql/9e74f9251f6c-slow.log |
+---------------------+--------------------------------------+
2 rows in set (0.00 sec)
Copy the code

Slow_query_log indicates whether slow query logs are enabled, and slow_query_log_file indicates the location where slow query logs are saved.

In addition to the above two variables, we also need to determine what the “slow” metric is, that is, how long it takes to execute a slow query. The default is 10S, and if changed to 0, it records all SQL.

mysql> show variables like '%long_query%';
+-----------------+-----------+
| Variable_name   | Value     |
+-----------------+-----------+| long_query_time | | + 10.000000-----------------+-----------+
1 row in set (0.00 sec)
Copy the code

3.1.1 Enabling Slow Logs

There are two ways to enable slow logs

  1. Modifying a Configuration Filemy.cnf

This modification method still takes effect after the system restarts

Slow_query_log =ON # long_query_time=2 slow_query_log_file=/var/lib/ mysql.logCopy the code
  1. Dynamic Parameter Modification (invalid after restart)
mysql> set @@global.slow_query_log=1;
Query OK, 0 rows affected (0.06 sec)

mysql> set @@global.long_query_time=2;
Query OK, 0 rows affected (0.00 sec)
Copy the code

3.1.2 Slow Log Analysis

MySQL not only stores slow log files, but also provides a slow log query utility called mysqlDumpslow. To demonstrate this utility, let’s construct a slow query:

mysql> SELECT sleep(5);
Copy the code

Then we query the 1 slow query with the most time:

[root@iZ2zejfuakcnnq2pgqyzowZ ~]# mysqldumpslow -s t -t 1 -g 'select' /var/lib/mysql/9e74f9251f6c-slow.log

Reading mysql slow query logfrom /var/lib/mysql/9e74f9251f6c-slow.log Count: 1 Time= 0s (10s) Lock= 0s (0s) Rows= 0 (1), root[root]@localhost SELECT sleep(N)Copy the code

Among them,

  • Count: Indicates the number of times the SQL is executed
  • Time: indicates the execution Time. The brackets indicate the cumulative Time
  • Locks: Indicates the duration of a lock, with the cumulative duration in parentheses
  • Rows: Indicates the number of records returned, with the cumulative number in parentheses

For more information on how to use mysqlDumpSlow, check the official documentation or run mysqlDumpSlow –help for help.

3.2 Viewing running Threads

You can run show Full processList to check the status and running time of all threads running in MySQL.

Among them,

  • Id: A unique identifier for a thread that can be killed with Id
  • User: the User who starts this thread. Ordinary accounts can only view their own threads
  • Host: indicates the IP address and port that initiates the connection
  • Db: database for thread operations
  • Command: indicates the Command of a thread
  • Time: operation duration (unit: second)
  • State: indicates the status of the thread
  • Info: the first 100 characters of an SQL statement

3.3 Checking the Server Running Status

Run the SHOW STATUS command to check the running STATUS of the MySQL server. There are two scopes: session and global. Generally, use the like+ wildcard character to filter.

-- Check the number of select counts
mysql> SHOW GLOBAL STATUS LIKE 'com_select';
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Com_select    | 168241 |
+---------------+--------+
1 row in set (0.05 sec)
Copy the code

3.4 Viewing the Running Information of a Storage Engine

The SHOW ENGINE command is used to display the current running information of the storage ENGINE, including the information about table locks and row locks held by transactions. Lock wait for transactions; Thread semaphore wait; File IO request; Buffer pool statistics and other data.

Such as:

SHOW ENGINE INNODB STATUS;
Copy the code

Innodb storage engine innoDB storage engine innoDB storage engine innoDB storage engine innoDB storage engine innoDB storage engine innoDB storage engine

3.5 EXPLAIN the execution plan

By querying logs slowly, we can know which SQL statements are executing slowly, and why? Where is the slowness?

MySQL provides a query command to EXPLAIN the execution plan, through which we can check the execution plan of the SQL statement. The so-called execution plan is: the optimizer will optimize our own SQL statement (e.g. outer join to inner join query, subquery optimization to join query…). , the optimizer makes a cost estimate of which indexes to use for the execution of this SQL, and ultimately decides which index to use (or chooses not to use indexes, but instead to use a full table scan), what strategy the optimizer is implementing for a single table, and so on.

EXPLAIN can also be analyzed for UPDATE, DELETE, and INSERT statements after MySQL5.6.3, but is generally used for SELECT queries.

This article mainly introduces the optimization strategy of MySQL from a macro perspective. Therefore, the details of the EXPLAIN are not explained here and will be separate.

3.6 SQL and Index Optimization

3.6.1 SQL optimization

SQL optimization is when there is no problem with the SQL syntax itself, but there is a better way to write it for the same purpose. Such as:

  • Use small tables to drive large tables; Rewrite the subquery with join; Or change to the union
  • In the connection query, the fan out (record number) of the driven table should be reduced as far as possible, the cost of accessing the driven table should be as low as possible, and the index should be established on the join column of the driven table to reduce the access cost. The join column of the driven table should be the primary key or the unique secondary index column of the table, so that the cost of the driven table can be reduced
  • Large offset limit, filter first then sort

To give a simple example of the last item, the following two statements do the same thing, but the second one is much more efficient than the first one (the storage engine uses InnoDB).

-- 1. Query with large offset
mysql> SELECT * FROM user_innodb LIMIT 9000000.10;
Empty set (8.18 sec)

-- 2. Filter ID first (because ID uses index), then limit
mysql> SELECT * FROM user_innodb WHERE id > 9000000 LIMIT 10;
Empty set (0.02 sec)
Copy the code

3.6.2 Index Optimization

Creating appropriate indexes for slow queries is a very common and effective method, but whether indexes are used efficiently is another matter.

I previously wrote about the things you must know to use MySQL indexes well, for those who are interested.

4. Storage engines and table structures

4.1 Selecting a Storage Engine

InnoDB is the default storage engine for MySQL, but when it comes to improving database performance, the choice of storage engine becomes a key factor.

You are advised to select different storage engines based on different services. For example:

  • You are recommended to query and insert service tables with multiple operationsMyISAM;
  • Temporary table usageMemory;
  • Select services with a large number of concurrent services and updatesInnoDB;
  • I don’t know what to choose.

4.2 Optimizing Fields

The ultimate rule for field optimization is to use the smallest data type that can store data correctly.

4.2.1 Integer Type

MySQL provides 6 integer types, respectively

  • tinyint
  • smallint
  • mediumint
  • int
  • integer
  • bigint

Different storage types have different maximum storage ranges and occupy different storage space.

For example, it is recommended to use Tinyint instead of Bigint to determine whether or not to delete an identifier.

4.2.2 Character Types

Did you just set all the string fields to VARCHAR? Varchar (1024) = varchar(1024)

If you are not sure about the length of the field, you should definitely choose VARCHAR, but VARCHAR requires extra space to record the current length of the field; So if the length of the field is fixed, use a char instead. This will save you a lot of memory.

Holdings is not empty

Try to set non-empty fields to NOT NULL and provide default values, or use special values instead of NULL.

Because null-type storage and optimizations can have poor performance, the reasons are not covered here.

4.2.4 Do not use foreign keys, triggers and view functions

This is also the principle mentioned in the “Alibaba Development Manual”. There are three reasons:

  1. Reduced readability, checking the code at the same time you have to see the database code;
  2. Give the calculation to the program, the database only do the job of storage, and do this job well;
  3. Verification of data integrity should be done by the developer, not by foreign keys. Once foreign keys are used, you will find it very difficult to delete junk data during testing.

4.2.5 Picture, audio and video storage

Do not store large files directly, but store access addresses for large files.

4.2.6 Large field splitting and data redundancy

Large field split is actually said before the vertical table, the not commonly used field or large amount of data field split out, avoid too many columns and large amount of data, especially used to write SELECT *, column number and large amount of data caused by the problem will be seriously magnified!

Field redundancy is not consistent with the database design paradigm in principle, but it is very good for fast retrieval. For example, customer ids can be stored in the contract table along with redundant customer names, so that queries do not need to obtain user names based on customer ids. Therefore, a certain degree of redundancy for business logic is also a good optimization technique.

5. Service optimization

Strictly speaking, business optimization is not a means of MySQL tuning, but business optimization can very effectively reduce database access pressure, a typical example of this aspect is Taobao, here are a few simple examples to provide you with ideas:

  1. In recent years, the pre-sale front of Double 11 is getting longer and longer. It starts more than half a month in advance, and all kinds of deposit red envelope modes are emerging in an endless stream. This way is called pre-sale diversion. In this way, customers’ service requests can be diverted without having to wait until the dawn of Singles’ Day to place collective orders.
  2. In the early hours of Singles’ Day, you may want to check for orders outside the day, but the query fails. Even the rations of the chickens in Alipay have been delayed. This is a demotion strategy that concentrates computing resources from unimportant services to ensure the current core business.
  3. Double tenth a pay treasure is highly recommended to use flower bai when pay, rather than the bank card payment, although part of the consideration is to improve the viscosity of software, but on the other hand, using balance treasure actually use ali internal server, access speed, and the use of bank CARDS, you need to call bank interface, the operation is much slower in comparison.

This is the end of the summary of MySQL optimization, there are a lot of details left out, which makes me feel that this article is not perfect. But some knowledge points are too many to write down all at once, and then write well.

I am cicada mu wind, public number “cicada mu wind”, a serious writing of the technical people, see next period!