Database selection and TiDB application practice of netease Interactive Entertainment

Li Wenjie, netease Interactive Billing Team, senior database management engineer, TiDB User Group Ambassador

I. Introduction to business architecture

The billing team is a public support department providing unified login and efficient payment solutions for netease Interactive products. Internally, it is the game studios of Netease Interactive and externally hundreds of channels at home and abroad. Due to the particularity of the business scenario, we deploy different application services for each game product, in which the large product environment is independent and the small product is deployed centrally.

With the surge of the department’s business volume, standalone MySQL encountered bottlenecks in capacity, performance, scalability and other aspects, so we began to investigate and select other database products. This paper will introduce in detail the comparison scheme of database selection of netease Interactive Entertainment billing team for their own scenarios, as well as the problems solved after using TiDB, and share the best practices of cluster management, monitoring and data migration in the process of using TiDB for your reference and exchange.

1.1 MySQL Usage Architecture

The basic usage structure of MySQL online in netease Interactive Entertainment billing group is as shown in the figure below, where the arrow direction indicates the direction of data or requests:

Online applications are deployed with Keepalive and multiple servers, and traffic is load balanced to ensure high availability of Application services.
The database layer architecture is Keepalive + master-slave structure, and the problem of delay and data consistency can be solved effectively by using semi-synchronous replication.
The Application accesses the back-end database through THE VIP. When the primary database node is down, the Application moves to the secondary database node through the VIP to ensure the normal external service provision.
The Slave node is used to back up data and collect online data. The data is exported to the data center in full and incremental synchronization mode, and then online and offline computing tasks are performed.
There are about 50+ sets of such architecture combination online, involving 200~400 servers, and the daily increase of data TB.

1.2 Current situation and Problems of MySQL usage

With the development of the business, the amount of data generated by various application services in the department also increases rapidly. The amount of service data is constantly increasing, which inevitably leads to performance bottlenecks in standalone MySQL. It is mainly reflected in the following aspects:

capacity
- The storage space of standalone MySQL instance is limited. To maintain the existing architecture, you have to delete and rotate old data to release space.
- In some scenarios of netease Interactive Entertainment, the capacity of a single table is more than 700GB, so the order data needs to be permanently saved and queried online in real time. According to the previous storage design, there will be obvious bottlenecks.
performance
- The maximum number of rows in a single table is 1.5 billion, which affects the read and write performance.
scalability
- MySQL cannot be flexibly expanded online, and storage bottlenecks cannot be solved.
Complex SQL
- After large table rotation, multiple sub-tables appear. When joint query, multiple sub-tables need to be joined. SQL is very complex and difficult to maintain.
- Standalone MySQL lacks the ability of large-scale data analysis.
Barriers to data
- Independent database deployment for different products;
- Data does not communicate with each other, leading to data isolation and forming data barriers;
- When cross-product computing is performed, multiple heterogeneous data sources need to be maintained with complex access methods. Data scattered in different data islands will increase the difficulty of data analysis and is not conducive to the mining of common value. The diagram below:
Figure 2 data silos of the present

Second, database selection

2.1 Research Objectives

In view of the existing problems of the current storage architecture, it is possible to use other storage solutions. Considering the current business and MySQL are highly coupled, the main requirements for database selection are as follows:

Must be compatible with MySQL.
Support transactions to ensure that tasks are executed in a transactional dimension or rolled back in case of errors;
Supports indexes, especially secondary indexes;
Scalability: Supports flexible online expansion, including performance expansion and capacity expansion.

Other requirements:

Stability and reliability;
Backup and restore;
Disaster, etc.

2.2 Optional Solutions

The serial number	plan	instructions
1	MySQL database table	Based on MySQL
2	MySQL Cluster	Based on MySQL
3	MySQL + Vitess	Based on MySQL
4	MySQL + MyCAT	Based on MySQL
5	CockroachDB (CRDB)	It is compatible with PostgreSQL but not with MySQL
6	TiDB	Compatible with MySQL protocol

2.3 test

2.3.1 solution based on MySQL

The initial preference is still to use mysql-based solutions such as MySQL InnoDB Cluster or MySQL + middleware solutions.

We test MySQL cluster 5.7.25 vs. 8.0.12, write 10 tables with 10 million rows each at 128, compare the situation of single node, 3 node and 5 node, as shown in the figure below:

In the test, it is found that the write performance of MySQL InnoDB cluster is about 30% worse than that of standalone MySQL, and other read and write test results are not satisfactory. After that, MySQL InnoDB Cluster or MySQL + middleware schemes were tested successively, but the test results were not up to the requirements, or a lot of code needed to be modified.

Therefore, we come to the conclusion that the solution based on MySQL InnoDB Cluster or MySQL + middleware does not meet our business scenarios. In summary, we don’t use MySQL sub-tables, middleware or MySQL clusters for two main reasons:

The scheme is too complex
Business code needs to be changed

Careful analysis shows that in fact, the solution based on MySQL InnoDB Cluster or MySQL + middleware is an extension of MySQL master-slave structure in essence, rather than a real distributed expansion. It is like realizing horizontal expansion in the way of “patch”, and many functions and features are naturally difficult to satisfy people.

2.3.2 CockroachDB VS TiDB

In the field of open source distributed NewSQL, well-known are TiDB and CockroachDB (CRDB for short), both of which are open source implementations based on Google Spanner’s papers. We did a lot of research and testing on the functionality and performance of both databases.

TiDB is naturally compatible with MySQL, while CRDB is compatible with PostgreSQL.
If the business is mainly MySQL, then TiDB may be a better choice; If it’s PostgreSQL, CRDB is probably the preferred option.

Testing, we also carried out a comprehensive comparison and testing. Here’s one of the test cases: In July 2018, we performed read and write tests on CRDB-v2.1.0 and TIDB-v2.0.5 (CRDB and TiDB clusters used default configurations and were not tuned).

Cluster topology

Test statements

Range query:

SELECT c FROM sbtest%u WHERE id BETWEEN ? AND ?
SELECT SUM(k) FROM sbtest%u WHERE id BETWEEN ? AND ?
SELECT c FROM sbtest WHERE id BETWEEN ? AND ? ORDER BY c
SELECT DISTINCT c FROM sbtest%u WHERE id BETWEEN ? AND ? ORDER BY c
Copy the code

Random IN query:

SELECT id, k, c, pad FROM sbtest1 WHERE k IN (?)
Copy the code

Random range query:

SELECT count(k) FROM sbtest1 WHERE k BETWEEN ? AND ? OR k BETWEEN ? AND ?
Copy the code

Update index column:

UPDATE sbtest%u SET k=k+1 WHERE id=?
Copy the code

Update non-indexed columns:

UPDATE sbtest%u SET c=? WHERE id=?
Copy the code

Read and write mixed: range query + more delete mixed

One of the key test results is as follows:

Conclusion:

CRDB and TiDB are comparable in performance;

Note: The above is the test result based on TiDB 2.0.5 in July 2018. Now TiDB has released 3.0 GA version, which has a qualitative improvement in performance. We have carried out supplementary tests recently, and in most scenarios 3.0 version has several times the performance improvement compared to 2.1 version. The latest test results are shown below:

Figure 7 TiDB 2.1.15 vs 3.0.3: OLTP peak comparison

Figure 8 TiDB 2.1.15 vs 3.0.3: TPC-C
CRDB is compatible with PostgreSQL. If you need to migrate, you need to transfer the protocol to MySQL → PostgreSQL → CRDB. The migration process is complicated and the cost is high.
TiDB is compatible with MySQL, with few code changes and low migration costs.

2.3.3 Final selection

The comprehensive comparison results are shown in the following table:

The database	extension	TP	AP	The degree of the document	Community activity
MySQL	no	is	no	rich	high
PostgreSQL	no	is	is	rich	high
MySQL InnoDB Cluster	no	is	no	less	low
MysQL + Middleware Vitess	is	is	no	rich	In the
CRDB	is	is	is	less	low
TiDB	is	is	is	rich	high

After careful consideration, we chose TiDB.

Iii. Use of TiDB in netease Interactive Entertainment Billing Group

3.1 TiDB usage architecture

The architecture design of TiDB used by netease Interactive Entertainment is as follows:

The whole cluster is divided into TiDB, TiKV and PD modules.
Use Nginx as front-end load balancer.

3.2 What requirements are addressed by TiDB

demand	The solution
Bulk storage	TiDB scales online in real time without downtime and frequent data cleaning
Performance requirements	TiDB supports multi-point read and write, and QPS increases with the number of nodes, which is theoretically unlimited
High availability requirements	TiDB strong consistency multi-copy storage, node downtime does not affect services
Data backup and Dr	Multiple copies of data; Supports cross-room and cross-region deployment. Mydumper + Loader multithreaded concurrent export and restore
Depots table	No need to separate the library and table, naturally support the efficient reading and writing of large table
Breaking down data barriers	support

3.3 TiDB usage status

business
- As the online MySQL data mirror, TiDB is responsible for the collection and centralized management of online data, forming a data lake;
- Applied to data platform services, including reports, monitoring, operations, user portraits, big data computing and other scenarios;
- HTAP: OLTP + OLAP.
The cluster
- Test cluster: V2.1.15, used for functional testing, feature testing;
- Online cluster: V2.1.15:80% offline big data computing tasks + 20% online services.
The size of the
- 41 servers, 88 instance nodes, 38 Syncer real-time synchronization streams (to be upgraded to DM);
- Storage: 20TB/ total 50TB, 2.3 million regions;
- Average QPS 4k/s, peak QPS 10,000 level, read/write ratio of about 1:5;
- Delay time: 80% within 8ms, 95% under 125ms, 99.9% under 500ms.

Fourth, best practice sharing

4.1 Cluster Management

Ansible (recommended)
- One-click deployment;
- Elastic expansion, flexible online expansion capacity;
- Upgrade: Smooth upgrade for single node rotation.
- The cluster starts, stops, and goes offline.
- Prometheus monitoring.
Docker
K8s
- TiDB Operator allows one-click management on both private and public clouds.

4.2 Operation and Maintenance Practices

2 Prometheus monitoring

Official integration with Prometheus + Grafana’s real-time monitoring platform provides complete monitoring from all aspects of the cluster, including:

Monitor basic server resources: memory, CPU, storage space, IO, etc.
Monitoring of cluster components: TiDB, PD, TiKV, etc.
Data monitoring: real-time synchronous flow, upstream and downstream data consistency check, etc.

The PD monitoring diagram is shown as follows. The cluster administrator can easily learn the latest status of the cluster, including the spatial Region of the cluster.

If the cluster is running in error, it can be easily detected in the monitoring panel. Here is an example of the use process:

When the application accesses the TiDB, it finds that data writing is very slow and the read request is normal. After investigation, Raft Store CPU index was found abnormal according to TiKV panel. The reason is that database replica replication is a single-threaded operation and has reached the bottleneck of the cluster. There are two solutions:

There are too many regions, and the Raft Store also processes heartbeat messages.

Solution: Delete expired data.
Raft Store single thread processing speed can’t keep up with cluster writing speed.

Solution: Upgrade from 2.1.5 to 2.1.15 to enable automatic Region Merge.

4.2.2 Some o&M problems and solutions

The problem	Problem version	Causes and solutions
Services are affected when large tables are indexed	2.0	The official recommendation is to operate during the off-peak period. In version 2.1, operation priority and concurrent read control have been added to improve the situation.
The structures of upstream and downstream tables are inconsistent, and the synchronization is abnormal	2.1	Insert into A (A,b,c) values… insert into A (A,b,c) values… . This feature is already being added and is expected to be released in Q4 2019 with support for inconsistent upstream and downstream table structures.
You cannot operate on multiple columns or indexes in a DDL	2.1	The ADD/DROP INDEX/COLUMN operation does not support the creation or deletion of multiple indexes or columns at the same time. You need to perform the operation separately.
Restart the PD node, and the PD Server timeout message is displayed	2.1	Before restarting the Leader node, manually switch the Leader node. The official suggestion is to perform Leader migration before the restart to mitigate the slowdown, and the parameters related to communication will also be optimized later.
The execution speed of table building sentences is slower than that of MySQL.	2.0 & 2.1	When multi-instance TiDB is deployed, the DDL Owner and the Server receiving the Create statement are not at the same time. The DDL Owner may take about 0.5 seconds longer than MySQL.
Delete too much data for GC to keep up	2.1	It is possible that the GC speed cannot keep up with the write speed. This can be solved by expanding the capacity and shortening the GC cycle interval. In the long term, distributed GC needs to be implemented, which has been officially implemented in version 3.0.
Storage space enlargement	2.1 & 2.0	This problem belongs to RocksDB. The optimal value of RocksDB’s spatial amplification factor is 1.111. It is recommended to enable RocksDB’s dynamic-level-bytes with TiKV in appropriate scenarios to reduce space magnification.
Truncate Table space cannot be fully reclaimed	2.0	After Truncate a large table, two phenomena are found: first, space recovery is slow; second, space recovery is not complete. Currently, version 2.1 optimizes the underlying mechanism of RocksDB, using the DeleteFilesInRange interface to delete the space occupied by the entire table and then clean up a small amount of residual data, which has been resolved.

4.3 Database Traversal across the Network

In the past, some services traversed the whole network database to obtain the required data, and needed to maintain multiple heterogeneous sources, which were very complicated and tedious. Using TiDB solves this problem nicely by accessing only one source to get all the data you want.

4.4 Data Migration

4.4.1 MySQL to TiDB

MySQL database migration to TiDB is divided into two parts: full and incremental.

Full amount
- Use tools (Mydumper or MySQL Dump, etc.) to export data from MySQL and record the binlog location of the current data;
- Use tools (Loader or Lightning, etc.) to import data into the TiDB cluster;
- Can be used for data backup and recovery operations.
The incremental
- TiDB, disguised as a Slave of upstream MySQL, synchronizes binlogs to the TiDB cluster in real time using tools (Syncer or DM).
- Usually when data is updated upstream, it is synchronized downstream in real time. The synchronization speed is affected by network and data volume.

4.4.2 Migrating Data out of TiDB

If you need to reverse import or synchronize data, you can use the TiDB Binlog tool to synchronize the Binlog of the TiDB cluster to MySQL. TiDB Binlog supports the following scenarios:

Data synchronization: Synchronize TiDB cluster data to other databases.
Real-time backup and recovery: Back up TiDB cluster data and recover TiDB cluster failure.

Import mode:

Full: TiDB is compatible with the MySQL protocol. If the MySQL capacity is large enough, you can also use tools to export data from TiDB and then import it to MySQL.
Incremental: Enable the Binlog switch on the TiDB and deploy the Binlog collector (Pump+Drainer) to synchronize binlog data to downstream storage schemas (MySQL, TiDB, Kafka, S3, etc.).

4.5 Gracefully “Separate inventory and table”

Example: a super large table by day, now want to query the information of an account for a year.

The upstream MySQL

SELECT xx FROM HFeeall join HFee20190101 join ... join ... join ... join HFee20190917 WHERE xx;Copy the code

N join conditions need to be joined, and the query needs to wait a long time.

Downstream TiDB
```
SELECT xx  FROM SuperHfeeall WHERE xx ;
Copy the code
```
With this scheme, the maximum single table is 700+GB, 1.3 + billion rows, and index queries are returned in seconds.

4.6 Service Migration

Objective: To solve capacity bottlenecks and system throughput bottlenecks by using TiDB’s horizontal scaling features.

Migration principles:

Data integrity and accuracy: data is very important, ensure good data, not lost;
Smooth and fast migration: high service sensitivity, short outage time;
Rollback: You can always switch back to MySQL if you have a problem.

1) Data synchronization

Use DM or Syncer to synchronize upstream MySQL data to the TiDB cluster. After setting up a synchronous flow, check the consistency of upstream and downstream data.

After the synchronization is complete, you can migrate some read traffic to the TiDB cluster based on service requirements.

2) Read and write verification

This stage is to verify that the application accessing MySQL and TiDB can get the same results, and verify the accuracy of business access.

Stop data synchronization, use the traffic replication tool to copy all online traffic, and read and write MySQL and TiDB simultaneously. The results of the two interviews are compared to verify that TiDB is reliable and credible. This phase can be tested for longer periods of time as needed.

3) Gray scale switching

Stop the double-write function in Step 2, that is, disable the double-write function, and synchronize the DM upstream.

Migrate the library table write operations accessing some non-core services to TiDB, and turn on the Binlog switch of TiDB to reverse synchronize MySQL online. This operation ensures that only data written to MySQL can be synchronized to TiDB, and only data written to TiDB can also be reverse-synchronized to MySQL, so that problems can be rolled back at any time. When services are normally accessed for a long time, you can increase the switching traffic and perform gray switching. A period of observation is recommended, at least one month.

4) Migration is complete

After the traffic migration is complete, keep the TiDB reverse synchronization to the MySQL process, and continue to observe the process for a period of time. After confirming the information, disconnect the reverse synchronization and the migration is 100% complete.

V. Summary and prospect

TiDB is compatible with MySQL protocol, supports TP/AP transactions and has good scalability, which can well solve the problems of large capacity and high availability of netease interactive entertainment billing group services. At present, our business is constantly deepening and expanding the use of TiDB, because we are optimistic about it, so here are some problems in use to help the original factory continue to polish the product:

Cluster data backup: Provides a more efficient way for the cluster to back up and restore SST files.
Transaction limitation: It is hoped that the limitation of large transactions can be relaxed. At present, large transactions still need to be manually segmented, which is relatively complicated.
Synchronization: expect DM to support synchronization with inconsistent upstream and downstream table structures;
Data hotspot problems: It is recommended to strengthen the automatic detection and clearing of hotspots.
Client retry: the current client code needs to encapsulate retry logic, which is not user friendly, hope to improve.

Finally, based on the existing usage of netease Interactive Entertainment billing group, we plan to continue to expand and deepen the usage scenarios of TiDB, enrich business types and scale, and look forward to TiDB bringing more convenience to our business.

pingcap.com/cases-cn/us…