Background: On May 23-24, Tencent’s “Cloud + Future” Summit with the theme of “Huan Qi” was held in Guangzhou. Leaders of government agencies at all levels in Guangdong Province, academic experts in the industry at home and abroad, industry giants and technology giants discussed the innovation and development of cloud computing and digital industry on the scene.

Tencent MySQL kernel research and development expert Zhang Qinglin in Tencent “cloud + future” summit “developers special” made the theme of “TXSQL: cloud computing era database nuclear warhead” technical content sharing, this share from five aspects of TXSQL:


Part ONE: The concept of Cloud and TXSQL

As a leading cloud computing company in China, Tencent Cloud has established more than 500 data centers around the world, which are used by tens of thousands of developers.


CDB product is TXSQL product provided by Tencent Cloud. CDB product provides a complete set of solutions, including users on the cloud, data migration, backup, recovery, upgrade and other operations. Compared with traditional users to build MySQL services, it reduces investment and is convenient to use.


The CDB service of Tencent Cloud has penetrated into all walks of life, including banks, securities, logistics, and traditional enterprises. After users purchase THE CDB, they will be assigned corresponding host, port, user name, password and other information, and users can directly connect to the background database through these information.


CDB services are currently divided into the site and network disk, TXSQL is the infrastructure department database kernel team independent maintenance of MySQL branch, external through CDB services provided to customers. As you can see TXSQL is the kernel and the location of the underlying data services.


Part 2: Why did we create TXSQL

Let’s look at the need to build TXSQL in detail:


From 2016 to 2017, the storage scale reached more than four times, and as the cloud computing market continues to mature, the industry coverage has also been unprecedented development, the entire instance level has reached 100,000 +.

When users build their own database services and run on their own servers, they may only encounter one or two problems a year, but if there are hundreds of thousands of servers running at the same time, the probability of encountering problems every day is very high, so we encountered three challenges:

First, the customer uses the CDB product, he will return the database request to the upper application, if the database has a problem, the availability is basically not guaranteed, so as the scale continues to grow, we must be able to quickly locate and solve the customer’s problems.

Secondly, business needs. With the continuous expansion of industry coverage, such as finance or industries with high requirements on data, they will have diversified demands on data security and other functions, so they need to have the development ability to meet customer needs.

Third, about performance, we know that when e-commerce is promoting or the game is active, the database is under great pressure. At this time, in order to improve the performance of the single machine, let users spend the least money to buy the best service.


To sum up, there are three challenges, one is stability, one is new business requirements, and one is performance improvement.

Based on this, Tencent Cloud needs to have its own kernel team to quickly locate customer problems, help customers solve problems, and improve the single machine performance at the maximum speed.

Part 3: How do we design TXSQL


B3M is our code design model, which can make the operation of customers more convenient, solve problems better, and also help customers realize some new requirements.

Users may ask, aS a basic software, MySQL has millions of code, the threshold is relatively high, how to maintain the stability of the modified version of MySQL?

We don’t make random changes to MySQL. Everything we do is rigorously tested.

First of all, we will demand analysis, demand analysis mainly comes from several aspects, one is the database problems in the operation process, such as the official BUG, the other is the user requirements, help them to better use the database, and to do performance tuning, users in the process of using MySQL, we also gradually improve performance and competitiveness.

Will do after demand analysis design, then code, write we will strictly control the code quality, every step will pass strict test, such as code coverage testing and unit testing, our basic every two to three months to release a database kernel versions, each version will run stability test and performance test, And crash recovery tests to make sure our code doesn’t introduce new bugs.

In the release of the version, the first individual instance of the release, and then the new TXSQL version of the instance for second level monitoring, no problem and then small cluster release, and finally the whole network release, through this way we ensure the stability and reliability of MySQL version.

Part four: What users need to be aware of when using the TXSQL kernel version

What services does the CBD kernel provide to the user?


Because our changes to MySQL are based on secondary development of MySQL, one is performance tuning, one is online problem solving, and the other is function development, which is implemented according to new business requirements.

The first three are the redo logs we tuned to improve performance. The first was redo logs, which improved performance by reducing the number of sync disks. The second was multiple buffers. The Redo Log is synchronized without affecting the writing of other transaction logs. The third one ensures that transactions do not affect each other when they write Redo logs to the system buffer, which improves concurrency. The Select offset limit operation pushes computing down to the engine layer to reduce CPU consumption and improve performance.


In terms of functionality, we have implemented features that are not available in the official version, such as encryption, auditing, thread pooling, and parallel replication. The first is the audit function, which is not available in the official version, but only in the enterprise version. In order to ensure the performance of users, we made an audit plug-in, so as to ensure the performance and realize the functions required by users.



The third is thread handling. When we test the pressure test, with the increase of concurrency, the performance will first improve and then decline, because of the serious competition among various resources in the system. TXSQL solves this problem by introducing the thread Pool, and solves the following problems:

Resolved deadlock issues caused by global read locks in Threadpool

Fixed the effect of the Dump Thread on the Thread Pool

Add a new Information Schema table to see what is going on inside ThreadPool



When the main library pressure grows, we prepare library of consumption of production couldn’t catch up with the main library, from 5.1, 5.5, 5.6, 5.7, this problem was not well solved, 5.6 although there is a parallel algorithm, but not completely solve the problem of delay, we introduced our parallel, thus solves this problem very well. When your primary database is delayed, the primary database hangs, and the secondary database consumes the accumulated RelayLog, the server cannot serve, and there will be a double write problem if it accepts the service.

We have encountered all kinds of difficulties both in the process of ascending the cloud and in the process of serving users.

For example, we found a performance problem of a game company in the process of helping them to get on the cloud. When we analyzed the system, we optimized it, adjusted various parameters and upgraded the kernel, and finally increased the user’s performance from 70,000 to 170,000.

The second case is more prominent for the game customer. Their instance encountered the problem of memory leakage, which occupied increasing memory and caused OOM of the machine. It took us nearly one week to find the RootCause of the problem, and then one week to release and test the problem in gray scale. However, due to the limited release of the official version, it usually takes two to three months to solve.


TXSQL is only available as a kernel version to help users perform calculations. We have several stability guarantees, one is full link monitoring, one is machine level operating system monitoring, and MySQL second level monitoring, as well as manual online help.


The fifth part: TXSQL future development direction


On the basis of maintaining stability, performance tuning and functional implementation, we will take these directions in the future.

Batch computations: For things that Engine can do, we can push computations down to the Engine layer to reduce consumption.

Execution plan caching is also something we will be working on in the near future. In a previous test, the simplest primary key-based query had a 10% performance improvement.

In order to solve the storage problem, we have introduced RocksDB into TXSQL, TXRocks, which will also be released in the near future. It can greatly reduce the cost of users while supporting transaction operations.

TXSQL Cloud computing era database nuclear warhead – Zhang Qinglin. PPTX

Please click below to obtain the PPT document.


TXSQL cloud computing era database nuclear warhead. PDF