The annual double 11 and double 叒 yi, to technical people the best gift is to promote technical guide! But after years of development, promoting has long confined to the electricity industry, now all walks of life actually do operations would adopt a similar manner, car has 818, 618, 11.11, and so on electricity, all kinds of big promote the scene, at the basis of the software, including database proposed many new challenges, but also accumulated a lot of best practices.

Before the arrival of Singles’ Day, PingCAP conducted a series of in-depth discussions with autohome, Bitauto, JINGdong, ZTO and other users, hoping to reveal what technical problems are hidden behind the annual soaring sales? What technical architecture can be used to smoothly withstand the flood peak?

Click here to watch the full interview participate in the interaction for a chance to get TiDB custom peripheral!

In the rush, the most expected thing after buying is to receive express delivery. Founded in 2002, ZTO Express is a comprehensive logistics service brand with express as the main body and international, express, cloud warehouse, commerce, cold chain, finance, intelligence, star Link and media as the auxiliary. In 2020, ZTO will complete 17 billion pieces of business, with a market share of 20.4%.

The whole life cycle of express delivery, transport cycle can be summed up in five words —Receive, send, arrive, send, sign:

The platform that supports the whole life cycle of express delivery is Zto Big data platform. Zto has a relatively complete big data platform system from offline to real-time data compatibility to data warehouse. ETL modeling will also rely on the big data platform and eventually provide data application support and support based on offline OLAP analysis through the big data platform. The frequency of the whole data modeling can be up to half an hour. On the basis of this perfect big data platform, ZTO began to think more about how to enhance real-time multidimensional analysis capabilities.

The relationship between ZTO and TiDB began in 2017 when we investigated the scene of sub-database and sub-table. At that time, zTO’s branch database and branch tables reached 16,000 tables, and the business could not continue to expand. At the end of 2018, ZTO began testing TiDB 2.0, focusing on the storage of large amounts of data, as well as analysis performance. In early 2019, ZTO launched production application support. The current production stable version is TiDB 3.0.14. At the end of 2020, ZTO began to test TiFlash with two goals: one is to improve timeliness and the other is to reduce the use of hardware.

1.0 – Meet the need

1.0 is the era of meeting requirements, and business requirements mainly include the following:

  • Business development is very fast, the amount of data is very large, each order update 5-6 times, operation peak;
  • It is difficult to support the demand of multi-dimensional analysis for the technical scheme that has done research;
  • The business side requires a long period of data analysis;
  • It also has high requirement on analysis time limitation.
  • Single machine performance bottlenecks, including single points of failure and high risks, are also intolerable in the business.
  • In addition, QPS is also very high and applications require millisecond response.

In terms of technical requirements, ZTO needs to get through multiple business scenarios + multiple business indicators; Need to beStrongly consistent distributed transactions, the cost of switching under the original business mode is very small; You also need toEngineering the whole analysis and calculation, offline the original stored procedure; To be able toSupports high-concurrency read/write and update; Able to supportOnline maintenanceTo ensure that a single point of failure has no impact on services. At the same time,Closely integrated with the existing big data technology ecosystemMinute-level statistical analysis; Finally, zTO has been exploring the establishment of large width tables with more than 100 + columns,Based on this wide table, multi-dimensional query analysis should be done.At present, some landing scenarios of TiDB application in ZHONGtong

Application scenario of aging system

Among them, the aging system is the original system of ZHONGtong, which has been reconstructed now. The original storage and computation of this system are mainly designed by Oracle, and computation depends on stored procedures. This architecture is also relatively simple, with message access on one side and load on the other.

With the growth of business volume, the performance of this set of architectures has gradually become a bottleneck. During the architecture upgrade of this system, Zto migrated the entire storage to TiDB and the entire computing to TiSpark. Message access depends on Spark Link and is finally sent to TiDB through the message queue. TiSpark provides minute-level calculations, with light summaries going to Hive and medium summaries going to MySQL. Based on Hive, Presto provides external application services. Both OLTP and OLAP greatly reduce the workload of development compared to the original relational database, and are integrated with the existing big data ecological technology stack.

The database system architecture of ZTO in 1.0 era

The benefits of migration are as follows: First, the capacity of the original data center is tripled, and the storage period of the existing system data is more than tripled. Second, in the aspect of scalability, it supports online horizontal expansion, operation and maintenance can be up and down computing and storage nodes at any time, and the application perception is very small. Third, it meets the high performance OLTP business requirements. Although the query performance is slightly reduced, it meets the business requirements. Fourth, the database single point of pressure is no longer, OLTP and OLAP achieve “separation”, do not interfere with each other; Fifth, it supports the need for more dimensions of analysis; Sixth, the overall architecture looks clearer, more maintainable and much more scalable than before.

Application scenarios of large and wide tables

Another scene is the construction and exploration of the wide watch that ZTO has been doing. In fact, I have tested many systems, including Hbase and Kudu. Kudu’s write performance is good, but its community activity is mediocre in China. At the same time, Zto uses Impala as the OLAP query engine, but Presto is the mainstream one, which is not compatible and can hardly meet the requirements of all business scenarios. In addition, the business characteristics of ZTO require that the system can quickly calculate and analyze billions of data, synchronize it to the offline cluster for fusion with T+1 data, and provide data products and data services with direct access to detailed data. Finally, the processing of massive data. Zto has access to many message sources, so it is necessary to carry out full-link routing and time-effectiveness prediction for each vote, and locate the transport link of each vote, which has a large amount of data and high requirements for time-effectiveness.

Zhongtong large wide table construction

At present, the wide table has been built with more than 150 fields. The data came from more than 10 topics. The main project access is through Flink and Spark, through which the data generated by each business is summarized to TiDB to form a wide service table. An additional part, relying on TiSpark, outputs analysis results from the service width table and synchrones 300 million pieces of data to Hive. In addition, ten-minute level real-time data construction and offline T+1 integration are also provided.

In the process of using the current cluster scale of ZTO, Zto has also encountered some problems, which can be summed up as quantitative change causes qualitative change. First, hotspot issues. Index hot spot is prominent in the current situation, because the business volume of ZTO is very large and there is a peak in the operation, and this hot spot problem is particularly obvious in the big time. Second, memory fragmentation. In the previous low version, after stable operation for a period of time, due to business features and a large number of updates and deletes, the memory fragmentation was serious. This problem has been fixed after being fed to TiDB. Third, we focus on a parameter – TiFlash read index parameter. After the test, you are advised to disable this parameter when the amount of data read or the total amount of data read is greater than 1/10. Why do you say so? Because the number of tests might get smaller, but the unit Test transition would take longer.

Operational monitoring

Using TiDB, you will find that it is particularly rich in monitoring metrics, using the popular Prometheus + Grafana, many and complete. Before, Because zto supports online business at the same time, there will be developers to check the data, encountered SQL to TiKV Server suspension. In view of this problem and the problem of monitoring, Zhongtong has carried out some development customization. First, compatible online special account slow SQL, will be automatically killed, and notified to the corresponding application responsible person. Second, ZTO developed a tool that supports Spark SQL to query TiDB. Concurrency and security are guaranteed in the development process. In addition, ZTO will also put some additional core indicators into its own research monitoring system. The core alarm will be notified to the relevant duty personnel by telephone.

Last year, during Singles’ Day, Zto’s order volume exceeded 820 million, the total business scale exceeded 760 million, and the PEAK QPS on singles’ Day reached 350,000 +. During the whole Singles’ Day, the volume of data update reached hundreds of billions. More than 100 TiSpark tasks were run on the whole cluster, and 7 online applications were supported. The timeliness of the whole analysis reached 98% in less than 10 minutes, and the data cycle of the whole analysis reached 7-15 days.

2.0 era – HTAP upgrade

The main characteristics of the 2.0 era areHTAP ascension. Zto’s application of HTAP mainly comes from the upgrade of business requirements: Based on business requirements, Zto has carried out an architecture re-upgrade in the 2.0 era. First, ** introduced TiFlash and TiCDC **. The benefits are actually enhanced timeliness. Some analyses are performed at the minute level, reducing the usage of Spark cluster resources.Data system architecture of ZTO in 2.0 era

Below is a comparison of TiSpark and TiFlash. There are two sets of clusters on Zhongtong, one based on 3.0 and one based on 5.0. A quick comparison between 3.0 and 5.0:3.0’s main analysis is based on TiSpark, 5.0 is based on TiFlash. Currently the 3.0 cluster has 137 physical nodes and the 5.0 cluster has 97 nodes. In the whole operation cycle, 3.0 is 5-15 minutes, and TiFlash based on 5.0 has achieved 1-2 minutes. The load reduction of the whole TiKV is quite obvious. In addition, Spark resources for 3.0 are around 60 units, while for 5.0, online and in testing, around 10 units will suffice.

The production cluster was 3.0 throughout the test cycle, and the 4.0 test cycle was actually quite short. In the test, there were some cases of dimension table Join in the business scenarios. At that time, 4.0 did not support MPP, and the support for some functions might not be so perfect, so the test results were not very ideal. HTAP is being tested primarily in phase 5.0, which already supports MPP and increasingly rich support for functions. At present, the production version of ZTO is TiDB 5.1.

On the right of the figure above is the load of the entire 5.0 cluster during 618. In the just-concluded 618, some of the tasks released in 5.0 are already supporting Kanban on 618 mobile. Zto has 6 core indicators calculated based on TiFlash. The cluster response is overall smooth, and the report reaches the time limit within minutes. The overall data volume is 4 billion to 5 billion, and the report analysis data reaches 1 billion.

3.0 – Looking to the future

  • The first is monitoring. When it comes to monitoring, zTO’s cluster is relatively large, so it may face more problems and encounter more problems. In a large cluster, there are many instances, and indicators are slowly loaded. Therefore, troubleshooting efficiency cannot be guaranteed. Although monitoring is very complete, but when there is a problem, it cannot quickly locate the problem;
  • The second is to solve the problem of occasional inaccuracy of implementation plans. This inconsistency sometimes affects the load interaction on some lines, which pushes up the cluster index and causes the service interaction.
  • The third is to achieve automatic cleaning. At present, the clearing of the data is written by their OWN SQL clearing, but expired data cleaning is more troublesome. Hopefully, automatic TTL for old data will be supported later.
  • Fourth, with the introduction of 5.0 column storage, ZTO plans to gradually transfer all the tasks of TiSpark to TiFlash, hoping to achieve the goal of improving the time efficiency and reducing the hardware cost.

For enterprises, the promotion is not only to support business innovation, but also to practice their own technical architecture and full-link exercise. Through the ultimate test of promotion, the enterprise’s IT architecture, organizational process and talent skills have been greatly improved. The experience and thinking in the promotion process will also accelerate the pace of business innovation, improve the efficiency of technology-driven innovation, and create a new engine for growth.