Welcome toTencent Cloud + community, get more Tencent mass technology practice dry goods oh ~

This article is published in cloud + community column by Tencent Cloud database team

Author introduction: Jian Huaibing, senior engineer of Tencent Cloud database, responsible for Tencent cloud CDB kernel and infrastructure construction; He has worked for Thomson Reuters and YY, and is the author of PTimeDB. He has a patent for invention. Have been engaged in MySQL kernel development for 8 years and have rich optimization experience; He has rich experience in distributed storage and other fields.

MYSQL database is applicable to a wide range of scenarios. Compared with Oracle and DB2, MYSQL database has higher cost performance. It can be used in Web sites, log systems, data warehouses and other scenarios. However, there are some problems such as poor transactional support (the default engine is InnoDB transactional from MySQL 5.5), multiple branches, and bottlenecks in read and write efficiency.

So how to make good use of MYSQL becomes very important. On the one hand, it is necessary to find out the system read and write bottleneck through MYSQL optimization and improve database performance. On the other hand, it is necessary to reasonably involve data structure and adjust parameters to improve user operation response. At the same time, as much as possible to save system resources, so that the system can provide a greater load of services. This article will introduce Tencent cloud team is how to carry out kernel level optimization of Mysql ideas and experience.

The early CDB was mainly based on the open source Oracle MySQL branch and focused on optimizing OSS systems for operations and operations. In Tencent Cloud, due to the increasing number of users, more and more high requirements for CDB for MySQL are put forward. Tencent CLOUD CDB team carries out in-depth customization and optimization of CDB for MySQL branch according to the needs of users and the technical trend of the development of the industry. The optimization focuses on kernel performance, kernel function and peripheral OSS system. The specific methods are as follows:

Optimization of kernel performance

As DB on Tencent cloud basically requires cross-campus Dr, optimization of CDB for MySQL focuses on solving performance problems in the real deployment environment under the premise of cross-campus network topology. After analysis and investigation, we summed up the idea of optimization as “eliminate redundant I/O, shorten I/O paths and avoid big lock competition”. Here are some examples of kernel performance:

1. Optimize the replication between active and standby DB servers

Problem analysis

As shown in the figure above, in the replication architecture of native MySQL, the Master side continuously sends Binlog events to the Slave I/O thread through the Dump thread. The Slave I/O thread has two main actions after receiving Binlog events:

  • Write to the Relay Log. This process compets with the Slave SQL thread for the lock protecting the Relay Log.
  • Update replication metadata (containing information such as the location of the Master).

An optimization method

After analysis, our optimization strategy is as follows:

  • Slave I/O threads and Slave SQL threads are typical single-write single-read producer-consumer models that can be designed without locking. The Slave I/O thread updates the Relay Log length atomically each time it writes data, and the Slave SQL thread reads the Relay Log with the length as the boundary. This resolves the competitive Relay Log lock to no lock;
  • Since the Global Transaction Identifier (GTID) in the Binlog event corresponds to the DB Transaction one by one, the data in the Relay Log already contains the required replication metadata, so we can not write the Master info file. Eliminates redundant file I/O;
  • In DB, the update granularity is transaction, because on Relay Log FILE I/O, we can greatly improve disk I/O by merging discrete small I/ OS into large I/O with transaction granularity.

The optimization effect

Optimized as shown in the figure above: Lock contention (FUTEX) of 35.79% on the left has been completely eliminated; Under the same pressure measurement, 56.15% of file I/O overhead was optimized to 19.16%, and Slave I/O threads were optimized to be the expected I/O intensive threads.

2. Optimization between the main library transaction thread and the Dump thread

Problem analysis

As shown in the figure above, in native MySQL, multiple transaction submission threads TrxN and multiple Dump threads compete for the protection lock of Binlog file resources. Multiple transaction submission threads write to the Binlog file, and multiple Dump threads read data from the Binlog file and send it to their Slave. All threads are executed sequentially!

An optimization method

After analysis, our optimization strategy is as follows:

  • After each writing thread updates the length of the current Binlog file, multiple Dump threads read the Binlog file length as the boundary, and multiple Dump threads execute concurrently. In this way, Dump threads in the replication topology send faster!

The effect

After testing, the optimized kernel not only improves the performance of transaction submission threads, but also improves the performance of master/slave replication when there are many Dump threads.

Ii. Optimization of the interaction process between master and standby libraries

Problem analysis

As shown in the figure above, data sending and ACK response between the primary and secondary libraries in native MySQL are simply executed in serial. The next event is not allowed to be sent until the ACK response of the previous event arrives. This behavior performs very poorly across campuses (RTT 2-3ms) and does not take advantage of bandwidth.

An optimization method

After analysis, our optimization strategy is as follows:

  • Send and ACK responses are received independently in different threads. Since both send and receive are transmitted based on TCP streams, timing is guaranteed. In this way, the sending thread can continue sending before receiving the ACK, and the receiving thread wakes up the waiting thread to perform the corresponding task after receiving the ACK.

The effect

According to the actual use case test, the TPS after optimization is improved to about 15%.

Three. Kernel function optimization

1. Reserve the connection quota of o&M accounts

On Tencent Cloud, there are exceptions or bugs in users’ APP from time to time, thus reaching the maximum connection limit of DB. This is because CDB OSS accounts cannot be logged in for emergency operation and maintenance operations. In view of this situation, we have created a configurable connection number quota in the MySQL kernel. Even in the above scenarios, o&M accounts can still connect to DB for emergency O&M operations. Greatly reduces the risk of DB anarchy in exceptional cases. This account has only database o&M management rights and cannot obtain user data, ensuring data security.

2. Strong active/standby synchronization

For some applications with high data consistency requirements, CDB is deeply optimized on the basis of MySQL’s native semi-synchronization to ensure that a transaction must be replicated to at least one standby database before committing on the primary database. Ensure consistency of data when the master library is down.

4. Optimization of peripheral system

In addition to some of the MySQL kernel side optimizations mentioned above, we have also made several optimizations for peripheral OSS platforms. For example, the asynchronous MySQL ping protocol is used to monitor a large number of instances, the HA/ service discovery and automatic capacity expansion of the original system are reinforced by distributed technology, and many optimizations are made in data security/failover and fast recovery.

Related to recommend

Tencent cloud database CDB for MySQL product related documents

MySQL database design summary

High availability analysis of MySQL database

Question and answer

How do I record PostgreSQL queries?

reading

Getting started with PostgreSQL

The PostgreSQL configuration is optimized

The PostgreSQL active/standby environment is set up

Machine learning in action! Quick introduction to online advertising business and CTR knowledge

This article has been authorized by the author to Tencent Cloud + community, more original text pleaseClick on the

Search concern public number “cloud plus community”, the first time to obtain technical dry goods, after concern reply 1024 send you a technical course gift package!

Massive technical practice experience, all in the cloud plus community!