Introduction: database service platform chosen by 100,000 + enterprises

Ali Cloud database has stably supported Tmall Double 11 for many years, undergoing quenching in extreme flow scenarios. In addition to ensuring the stable and smooth basic disk, during the promotion period of this year, the database has greatly improved user experience through comprehensive cloud original biochemical, so that technology can help businesses produce more valuable consumer experience, and continue to empower users through technological innovation, leading the technology development path.

Double 11 has come to a successful conclusion, but the exploration of technology has not stopped.

preface

Ali cloud one-stop online data management platform DMS, from the earliest in the service of each group internal business tool type products, has experienced the ali group database technology and architecture evolution of each period, promote test, double tenth of a large cloud native calendar year transition, evolution to today in a uniform cloud architecture, internal and external to the ali group cloud customers with one-stop data management service, Products continue to expand the boundaries, enhance the depth of technology, ali Group’s super-large data management method, to promote and serve all developers.

The key components

Data management DMS: as a one-stop database development platform for r&d students used by the database team, which was launched in 2009, it provides real-time database access, implementation of database research and development specifications, data safety management and safe production capabilities for the group, Ant and public cloud customers. Meanwhile, it provides users with one-stop public Cloud, hybrid Cloud and offline self-built database backup combined with database backup DBS capability. In addition to stable backup and recovery, it released Cloud Data Management (CDM) capability in 2019 to achieve second-level recovery of backup Data. It supports a wide range of customer business scenarios in finance, education, gaming and more.

** Data backup DBS: * * is the database team announced in 2017 the database backup products, to provide users with “one-stop” work style public cloud, a hybrid cloud, offline self-built database and stable backup services, at the same time release in 2019 ali cloud first cloud native CDM products, using the characteristics of cloud help customers data second level recovery, second level recovery products have been supporting the education, games, And other important customer scenarios.

Data transmission DTS: (Data Transmission Service, referred to as DTS) has been supporting Ali Group’s disaster recovery to remote live to Ali Cloud official website since 2011. It was named AS DTS and completed the product transformation in April 2015. It is the world’s first public cloud Data Transmission product. It integrates the performance and business characteristics of Ali Group with the diversity of data sources of public cloud. Integrating data migration, subscription, and real-time synchronization, it solves the problem of long-distance and second-level asynchronous data transmission in public cloud and hybrid cloud scenarios. Its underlying infrastructure adopts alibaba Double 11 remote live architecture to provide real-time data flow for thousands of downstream applications. It has been running stably online for 6 years. DTS supports relational database, NoSQL, big Data (OLAP) and other data sources, and has compatible evaluation conversion and real-time synchronization capability in traditional commercial database migration, especially Oracle, DB2 large & small machine series and other commercial databases.

DMS an overview

DMS: data management, data backup, data transmission technology of precipitation, unified into new DMS products, to provide users with one-stop global asset management, database design and development, data integration and development of the whole link ability, during the 2021 double tenth, asset management group users to provide a full range of data services.

Business challenges

  1. Business due to the historical data, makes the storage water level is too high, tables, increasing RT rise, in the face of this kind of problem DMS provides historical data cleaning function, can let business non-inductive historical data delete clean-up, part of the scene after the clean up debris rate, storage recycling effect in general, in the face of such situation, Research and development needs to select a time to optimize the table operation, and the operation approval is cumbersome. How to simplify the operation and reduce the storage water level becomes an urgent problem to be solved by the service side.
  2. In the database change, DDL change is a high risk operation, especially in the scenario of database and table, how to control the risk of DDL change is the problem raised by business students to DMS.
  3. Data subscription as numerous upstream of middle product, provides the recommended applications such as cache invalidation, advertising push, search, and a unique double tenth GMV domestic business scenario functions such as the foundation, this year the newly introduced library warehouse integration architecture OLTP to OLAP ability, for the hand tao trade order search function brings the ability to ascend and user experience optimization
  4. Users may only remember the vague information of the commodity name and the information of the commodity store. The order search of the old link can only match like in the database according to the query keyword. If the input keyword is inaccurate, the order may not be found. If search words are too short, users find the order time is long, the user in order to increase the hit ratio, will enter short keywords, such a query order quantity too much, classification of search results and not at the same time, the user in the search results to find target orders, only slide on the next page, search a long time, only a functional degradation during double 11 to promote.
  5. For the first time, all group databases will be on the cloud, and a large number of instances will be deployed in the central station at the same time. As VIP customers and public cloud customers, the group will be deployed in the same region, which will cause heavy traffic pressure on backup storage. If there is no relevant technical solution, the upper cloud and public cloud customers will influence each other. Incremental backup is the core problem. When the traffic pressure of backup storage is very high, incremental data accumulation will cause the space of customer log disk to be full, causing instance RO to fail to be restored to any point in time.

Technology upgrades

No lock data change, no lock table optimization

In historical data deletion scenarios, delete statements carry conditions such as time, which does not necessarily have indexes. As a result, the deletion speed is very slow and data locks are occupied. If a large amount of data is deleted, the deletion will fail due to the constraint of binlog transaction size.

DMS controls the execution time and the number of rows affected by transactions by transforming large transactions into small transactions. It deletes 10 million pieces of data about 40G through the optimized 400GB table. The slow deletion takes 5 hours and 0 slow SQL.

After cleaning up the historical data of the oversized table through lockless data change, we can relocate the table data again through DMS lockless data change technology to optimize the table space.

DDL grayscale change

When business students make structural changes to a data table, usually the DDL is a whole transaction and the whole table cannot be accessed once a problem occurs. However, in the scenario of database and table division, services in the logical table dimension are divided into multiple parts. If the changes are still performed as a whole, the advantages of database and table division in change will not be played.

It is a good choice to use these fragments to reduce the risk of change for gray scale. Through gray scale strategy control, the gray scale strategy of sub-database and sub-table is defined as single table level gray scale, single library gray scale and single instance gray scale. With gray scale, user structure changes more calm.

Efficient Data Backup

The PITR capability of peak log backup is not degraded. DMS compresses the Binlog data generated by the group XDB, and at the same time, combines the log feature of the multi-copy node of the group XDB to prepare only one log.

** Traffic shunting: **DMS can back up XDB logs in real time and add traffic rules internally, which can back up some instances of XDB Cluster to other storage to achieve the role of shunting.

** Low traffic generation: **DMS compresses and prunes the binlog data generated by XDB, backing up only the data on the XDB leader. Since the binlog on the XDB Leader&Follower is completely consistent, so in the abnormal recovery process, as long as the corresponding position of binlog disconnection is found, Connect the logs on the followers.

** Traffic isolation: * * customers, given the group on cloud and public cloud clients in the number of backup flow and influence each other, so the group on the full amount of cloud/increment, all public clouds other customers/increment, the bucket isolation, the business flow forecast for the full amount bucket of current limiting, guarantees the public cloud customer & group on cloud increment have larger limit, The traffic is not enough to meet the second-level RPO.

**0 point peak performance: ** for the first time to achieve double eleven log backup without interruption, ensuring second level RPO, group peak traffic log write backup storage traffic and total traffic reach hundreds of Gb/s, ensuring real-time write RPO.

Warehouse integrated technical architecture

In the past, the database T+1 went to the data warehouse and then backflowed to the database to display the corresponding calculation results. The link was long, the maintenance cost was high, the data delay was large, and the peak period had a great impact on the source and library. In the previous promotion process, the historical order search function of Taobao was used to limit the flow.

This year, the warehouse integration architecture realized by one-click DMS+ADB realizes real-time data acquisition, real-time transmission and processing, real-time query and presentation, and the transaction order search capability supporting multi-dimensional data analysis scenarios in full real-time. The zero-point peak DMS writing ADB is delayed in milliseconds in the whole process under millions of RPS traffic. ADB real-time query results are returned in milliseconds. After the upgrade of warehouse integration technology architecture, the order search of Mobile shopping has increased the ability of “guess what you want to search” and “category search” :

  1. Guess what you want to search: according to the search word recommended associative word and shop name, users can click on the associative word and shop name to search, increase the user’s search hit rate;

  2. Category search: added the function of searching by store name, so that the order list searched by users can be classified by category, and the order list can be classified by TAB page, reducing the number of dropdowns of each TAB.

Finally, through the integration of warehouse and warehouse architecture, the influence of the function degradation on users in the past was solved, and the functions were fully open for use in the process of the Double 11.

Inventory business

Based on alibaba’s unitary architecture, single traffic will be distributed to each unit when users purchase goods. The real-time data synchronization capability of DMS ensures the real-time consistency of data in databases between units. Meanwhile, the real-time data subscription function provided by DMS is the functional basis of the real-time update cache service based on database changes for inventory applications. These capabilities ensure that what you see is what you get for leftover inventory during the ordering process on the client side, improving the overall shopping experience.

DMS ensures the efficiency and stability of the whole link under the flow pressure of millions of RPS of inventory service.

Trading &GMV large screen

In the trading business, DMS provides top-level real-time data consumption capabilities for GMV media large screens for society and large screens for internal executives. Problems on the link directly affect the accuracy of data on the large screen. To ensure stability, DMS adopts active-standby active-active architecture on the transaction link to ensure the high availability of GMV on the whole link under the traffic of millions of RPS at the transaction peak.

Summary of DMS support

DMS during 2021 Tmall Double 11:

  1. Protected nearly 500,000 times of dynamic real-time access and static access of sensitive data by data desensitization. The fine network blocking control for the first time online improved the production safety efficiency by 50%, effectively blocked large queries for more than a thousand times and DDL changes for more than a hundred times.
  2. 100% support group on cloud instances to restore the function of the second grade RPO, introduction of technology solutions to traffic bandwidth was reduced by 50%, for the first time to support the use of OSS storage sharing to promote peak flow, to ensure the peak flow incremental backups don’t downgrade, comprehensive security cloud on the group’s core trading scenario can at any time the restoration of data to any point in time, Improved the stability of the whole Double 11.
  3. There are tens of thousands of synchronous links and hundreds of thousands of subscription tasks in the whole network. During the period of 0 point traffic peak, pB-level log data is accumulated from the source library and hundreds of billions of transactions are written to the target library in a few minutes. During this period, there is no interruption of the whole network task and no delay of the core task.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.