Project background

First-time TiDB, is through the same routine chief architect Mr Roy’s share, at that time with cheng is making development and database comprehensive open source direction transition, due to business needs, a lot of online business visits and quantity of the data is very large, and MySQL cannot meet the demand of complex queries under the large amount of data, in order to make the database shard to develop transparent, DBrouter was developed in the same process. However, it is still a difficulty for us to merge after fragmentation, real-time summary statistics and monitoring of full data. There has been no particularly good solution.

Booming business

Before the National Day of 2016, ticketing items of Tongcheng (train tickets, air tickets and other ticketing businesses in wechat jiugong are provided by Tongcheng) were under increasing pressure from the order base due to the surge in traffic. Meanwhile, the demand for relevant businesses was also increasing. Various queries were constantly added to the order base. For example, minute-level order volume monitoring (orders executed per minute based on a summary of different conditions) is added for the purpose of locating exceptions in a timely manner. Such functions are more and more, while the total size of the order library is about T. In this regard, the company internally decided to split the ticket order database to reduce the pressure of single database and cope with the upcoming National Day peak order outbreak.

The introduction of TiDB

After evaluation, it is found that the sharding developed by the company can meet most of the query requirements, but the query with some complicated conditions will affect the performance of the whole sharding cluster. A small amount of full-slice scanning SQL often occupies more than 80% of IO resources, resulting in the performance of other queries. At this time, it happened that our chief architect proposed to try using TiDB. After the cooperation test of middleware group and DBA group, we tried to use TiDB as the collection library of all data to provide complex queries, and the sharded cluster to provide simple queries. Meanwhile, because TiDB is highly compatible with the connection protocol of MySQL, We made a secondary development based on PingCAP’s data synchronization tool Syncer, which allows us to customize database and table names (after communicating with TiDB engineers, their latest Wormhole & Syncer has also supported customization options), and added synchronization status monitoring. If an exception occurs, such as TPS or delay, an alarm is generated through wechat. Synchronize data from MySQL to TiDB in real time to ensure data consistency.

After the solution was determined, we arranged pressure test colleagues and development colleagues to cooperate overnight and conducted urgent tests. We found that the solution of sharding cluster +TiDB could meet our functional and performance requirements, so we quickly adjusted the project architecture, and gathered thousands of MySQL sharding into a TiDB cluster. To ensure the 2016 National Day peak smooth through. There was twice as much traffic as usual, but nothing unusual happened.

The real-time synchronous query system architecture is as follows:

After the successful implementation of the project, we deepened our use of TiDB. Monitoring was deployed with PingCAP’s advice and assistance.

At the same time, in order to pay more attention to the database situation and find the abnormality in the first time, we connected the abnormal alarm of TiDB to the company’s monitoring system and self-healing system. When an abnormality occurs, the monitoring system will find it in the first time, and then the self-healing system will deal with the corresponding abnormality according to the healing logic formulated in advance, and restore the availability of the application in the first time.

Use on a larger scale

After the launch of the business, we soon migrated the air ticket business to TiDB for real-time synchronization. As of the deadline of this paper, there are several sets of TiDB clusters in the same process, with nearly 100 deployed servers and a total data volume of tens of TB. Among them, the largest cluster has more than 10 data nodes, nearly 10 TB data, over 10 billion data volume, supporting over 100 million visits every day, and providing data monitoring services of 10 million levels. The average QPS is 5000, and the peak QPS is over 10,000.

At the same time, due to the ease of use of TiDB (highly compatible with MySQL protocol and standard SQL syntax), we have considered TiDB as an important database deployment solution at the beginning of the project. Over the course of more than a year of use, we have been in constant communication with PingCAP engineers, and we often communicate with each other about technology and usage. For the latest version of TiDB, we are actively working with PingCAP for testing and problem feedback. They are also very timely to give us feedback and quickly fix some bugs.

Looking forward to

Now more and more developers in the company are contacting DBAs for information about TiDB. Our feedback to them is that it is a database that is highly compatible with MySQL protocol and syntax. It is very simple and easy to use. You can use it as if it were a MySQL, but it can hold a lot more data than MySQL. For DBAs, this is a highly available and dynamically scalable database, MySQL externally and distributed internally. There is basically no learning cost for developers on the business side, DBA maintenance is also very similar to MySQL, the system ecosystem is very good.

It can be predicted that with the continuation of the project and the construction of new projects, the number of instances and machines of TiDB will continue to grow at a fast speed. Currently, the version used online is not the latest version, and preparations are being made to upgrade to 1.05. We expect that by the end of 2018, the number of TiDB clusters will soon reach 20 sets and hundreds of machines, which brings certain challenges to development and operation. If we continue to build and operate the TiDB cluster the way it is now, we may have to increase the manpower involved. We were looking for a convenient solution to manage multiple TiDB clusters when an article called “Cloud+TiDB Technical Interpretation” caught our attention. We quickly got in touch with TiDB engineers and learned that TiDB’s latest DBaaS solution is based on K8S to automatically manage and schedule multiple TiDB instances, which is consistent with our current strategic direction of a large number of Docker-based businesses and databases. Tidb-operator enables automatic deployment and management of TiDB and surrounding tools, automatic deployment of these applications and back-end failover capabilities, which greatly reduces o&M costs and provides rich interfaces for subsequent expansion.

We plan to start working with PingCAP to introduce TiDB DBaaS in 2018.

In addition, through in-depth communication with PingCAP engineers, we learned about TiSpark, a sub-project of TiDB. In the future, we plan to introduce TiSpark to conduct real-time data analysis and real-time data warehouse, so that technology can generate greater value for business.

Author: Qu Kai, senior DBA of Tongcheng Network.