Author: Qu Kai, senior DBA of Tongcheng Yilong

background

With the rapid development of the Internet, the business volume may explode in a short period of time, and the corresponding data volume may rapidly increase from hundreds of GB to hundreds of TB. The service provided by the traditional stand-alone database is no longer applicable in terms of system scalability and cost performance. In order to cope with the performance problem of business service access under the large amount of data, the commonly used split database and table schemes of MySQL database will become more and more complicated with the increase of MySQL Sharding (Sharding), and the logic of business access to database will become more and more complicated. In addition, for some tables with multi-dimensional query requirements, additional storage or performance needs to be introduced to meet the query requirements, which will make the business logic more and more heavy, which is not conducive to rapid product iteration.

TiDB architecture

TiDB, as PingCAP’s open source distributed database product, has strong consistency of multiple copies and can be flexibly scaled up and down according to business requirements, and has no awareness of upper-layer business during expansion and contraction. TiDB consists of three core components: TiDB/TiKV/PD.

TiDB Server: The parser and optimizer primarily responsible for SQL, which acts as the computing execution layer, as well as client access and interaction.

TiKV Server: it is a set of distributed key-value storage engine, which undertakes the storage layer of the whole database. The horizontal expansion of data and the high availability of multiple copies are realized in this layer.

PD Server: equivalent to the brain of distributed database. On the one hand, PD is responsible for collecting and maintaining the distribution of data in each TiKV node; on the other hand, PD plays the role of scheduler and adopts appropriate scheduling strategies according to data distribution and the load of each storage node to maintain the balance and stability of the whole system.

With the three components above, each role is a multi-node cluster, so the final architecture of TiDB looks like this.

Therefore, the complexity of distributed system itself leads to high cost of manual deployment, operation and maintenance, and error prone. Traditional automatic deployment o&M tools, such as Puppet, Chef, SaltStack, and Ansible, cannot automatically failover nodes when faults occur due to lack of status management, requiring manual intervention by O&M personnel. Some need to write a lot of DSL or even mixed with Shell scripts, poor portability, high maintenance costs.

For TiDB, a complex distributed database, we consider container management of TiDB to achieve the following objectives:

First, shield the underlying physical resources

2. Improve resource utilization (CPU and memory)

Iii. Improve operation and maintenance efficiency

Fourth, fine management

Therefore, based on the above requirements, we developed Raytheon system to manage and maintain TiDB in a unified manner. Its overall architecture is as follows:

You can see from the architecture diagram that this solution is TiDB’s private cloud architecture. The bottom layer is the container cloud, the middle layer is the development of container orchestration tools, the top layer for TiDB characteristics and practical problems encountered in the use of targeted development so as to achieve the unified management of TiDB cluster instances. The following describes the functions of each module one by one.

Vessel schedule

The current mainstream container scheduling system Kuberbetes used to be our preferred solution for container scheduling. However, TiDB as a database service needs to store the database to the Local disk, and Kuberbetes does not support Local Storage (currently the new version has started to support). According to the characteristics and business requirements of TiDB, we decided to implement a set of container choreography system by ourselves to solve the following problems:

  • Supports LocalStorage to solve data storage problems
  • Based on CPUSet-cpus, CPU resources are randomly allocated evenly
  • Customization, support label, implement specific services to run on a specific host; Host resource restrictions
  • Proactive discovery and notification of containers to connect previously unmanaged hosts to unified management
  • Lifecycle management of containers
  • Container exception repair and notification

Thor adopts modular design, which is divided into control module and agent module. Its overall architecture is as follows:

Description:

  • The control module includes Allocator, Label, Discover, Manage and Customize. Allocator allocates host resources. Label is mainly used for Label customization. Customize is responsible for customization requirements. Discover is mainly responsible for container discovery and anomaly detection; Manage is responsible for overall scheduling and distribution.
  • The agent module is mainly responsible for checking resources and collecting information, and receiving commands from the control end.

Cluster management

Cluster management is one of the core modules of the whole system, including TiDB cluster daily maintenance operations, TiDB initialization, smooth upgrade, elastic capacity management, monitoring integration, cluster maintenance, node maintenance and other functions. PingCAP offers an ansible-based automated deployment solution, but it still requires a lot of filling out and checking machine Settings to complete the deployment. Through this system, you only need to submit the requirements according to the format to complete the deployment of the whole cluster, and the deployment time is reduced from the previous 2 hours to about 2 minutes.

Database management

Database management is a very core part of daily operation and maintenance. This module completes statistical information update, overload protection, slow query analysis and SQL early warning through tasks.

1. Update statistics

TiDB will automatically update statistics, but need to change in the fixed percentage, library, the merger of TiDB as subdivision of library data volume up to billions, if rely on its own statistical information maintenance, there will be triggered by statistical information is not accurate slow queries, so with this situation, design and development of statistical information automatically update, in addition to the regular setting, You can also set exceptions to prevent services from being affected when statistics are updated.

2. Overload protection

Based on the analysis of SQL execution time and memory usage, you can customize different overload protection policies for different clusters or use a unified overload protection policy. When the policy is triggered, relevant information will be notified to relevant personnel through wechat.

3. Slow query analysis and SQL warning

The slow query analysis system is built by ELK, SQL warning is built by mysql-Sniffer, Flume, Kafka, Spark and Hadoop, and the trend analysis and prediction are used to accumulate data for subsequent automatic capacity management.

Data synchronization

We try to use TiDB as the collection library of all data to provide complex queries, while shard cluster provides simple queries. At the same time, because TiDB is highly compatible with MySQL connection protocol to meet the requirements of complex business queries, we reconstructed the code based on PingCAP data synchronization tool Syncer. Developed hamal synchronization tool, which can customize the database name and table name, and added synchronization status monitoring, such as TPS, delay, etc., if abnormal, will alarm through wechat. Synchronize data from MySQL to TiDB in real time to ensure data consistency. The real-time synchronous query system architecture is as follows:

Hamal is a tool disguised as mysql slave, from mysql master through the master slave replication protocol to parse into the corresponding SQL statement, and through filtering, rewriting and other steps, the final statement in the target library execution. Hamal contains the following features:

  • Position and Gtid mode support
  • Automatic primary/secondary switchover (The primary/secondary service list must be configured in advance)
  • Multi-target library support (multi-TIDB-server)
  • Binlog Heartbeat support
  • Library and table level filtering, rewrite support (for sharding combined libraries)
  • Library table level additional index support
  • Disassembly field support (additional output to select a small table of several fields)
  • Field filtering support
  • Smart update table structure
  • Multithreading merges small transactions after execution, multiple distribution strategies
  • Plain text execution mode support

The internal implementation of Hamal is as follows:

As you can see from the architecture diagram, hamal supports synchronization to different destination libraries or other storage methods by having different generators.

Monitoring and Alarm Center

Monitoring is of great importance to the system. Effective alarms directly affect the quality of monitoring, so the core of monitoring is the collection of monitoring data and effective alarms. Monitoring data mainly includes three operating states of the system itself, such as CPU, memory, disk, and network usage. The health of various applications, such as databases, containers, and so on, that process data sent over the network. You can flexibly customize monitoring information collection by setting monitoring items and monitoring exceptions. Reasonable and flexible monitoring rules help to locate faults more quickly and accurately. Alarm policies and alarm exceptions meet different alarm requirements. The architecture of the monitoring and alarm center is as follows:

Among them, the collection of monitoring data partly depends on the data in the existing monitoring system, such as Zabbix. Part of the data is obtained through TiDB API, and part is open source collector, so the original data is stored in different types of databases. Through the synchronization tool developed, the above data is synchronized to the independently deployed TiDB cluster for subsequent data analysis. The implementation of the visualization is mainly based on Grafana. The alarm module is developed and implemented based on actual requirements, without using some existing open source solutions.

conclusion

In the process of the use of TiDB, we carried out in accordance with the way of library 1 cluster service deployment, this way of deployment of different libraries can effectively avoid the uneven pressure lead to the problem of mutual influence, at the same time performance monitoring accuracy to the level of library, and after using the Raytheon systems, effectively on a single server resources of all kinds of service rapid deployment, Improve resource utilization and avoid resource contention.

Since the launch of the system one year ago, all TiDB clusters of the company have been successfully migrated from physical machine deployment to containerization. Managed hundreds of machines and dozens of sets of TiDB clusters, accessed hundreds of applications, carrying tens of tons of data, with peak TPS of hundreds of thousands; It takes nearly two hours to deploy a TiDB cluster before going online, and only two minutes after Raytheon goes online. It effectively improves the operation and maintenance efficiency of DBA and the availability of the whole TiDB service.

In the future, we will continue to deepen the audit and SQL analysis to provide more efficiency and stability for the business.

Thor — TIDB automatic operation and maintenance platform