With the advent of DT era, enterprises have more and more data, whose scale may reach hundreds of TB or even PB level. How to manage and maintain such a database at a reasonable cost has become the core problem in IT management of various enterprises. HybirdDB for MySQL is a HTAP resource based database that supports OLTP, transactions on a piece of data, and real-time analysis. On October 12, the Cloud Computing Conference · HTAP technology special session, Ali Cloud product expert Chen Zhuo shared ali Cloud research HTAP database, and focused on sharing related technologies and implementation, how to use HTAP technology to solve business pain points.

Share the main content of this article includes five parts, the first is ali cloud database product team, user actual scene, on-line problems found during the processing of user requirements, ali cloud constantly deep reasons of the database, and the features of the product and solve the problem, including after the commercial experience and product development direction in the future.

To advance the ApsaraDB

Constantly improving database — With the support of users, ApsaraDB continues to forge ahead. Since its inception in 2011, ApsaraDB has continued to undertake financial cloud and Double Eleven Stone Tower. Since 2010, the ApsaraDB database team has been growing continuously with Ali Cloud. So far, it has experienced six or seven years. During this process, the team has witnessed the rise and rise of cloud computing era in China, and also experienced several major events. For example, access to financial cloud, support for Yu ‘ebao business, and the upcoming “Double 11” diversion project, including users’ order business, will be operated on the whole Aliyun.

 

In the context of cloud computing, the team is also constantly enriching products, adding in product, reliability, scenarioization and other aspects. For five consecutive years, in order to let users entrust their business to Ali Cloud, more than 1,000 functions have been optimized every year. In the rigorous process of the product, the mainstream database engine and user-friendly functions have been gradually added to the product, the characteristics of the database is not a simple open source database products, a single product can be solved. With the in-depth contact with users, Ali Cloud has also continuously realized the importance of self-research database, and finally launched two self-research database engines in 2017.

Now Ali Cloud database has grown into a forest, now there are nearly 20 database products on Ali cloud. The team saw an Internet giant move to the cloud, flourish and eventually go public. It also witnessed that many initial entrepreneurs are more focused on business development with the help of the power of cloud database, bursting infinite innovation power, and constantly growing; There are also many excellent enterprises in the segmentation industry, with the help of the power of cloud database constantly optimize the data architecture to feed back business transformation, so as to coruscate stronger vitality. Currently, there are more than 10W paid instances running on Aliyun.

Towards the research

The following figure is the current situation of ali Cloud RDS team’s products, which is also the background of the database team’s product self-research. When the database is deployed on the cloud, it is different from the traditional installation package deployed on the local, especially in the case of hardware and architecture on the public cloud, only the database hosting service is used for users. But for the RDS product itself, not only does it want to provide an open source database service, but it also wants to create more hardware dividends while honing its competitiveness. The strong momentum of AWS can also be clearly seen from Gartner’s operational quadrant and the magic quadrant of data warehouse. With the gradual popularity of cloud computing, cloud vendors are bound to take the lead in the new generation of database products, and Ali Cloud is naturally not willing to lag behind. Self-developed database products have unique advantages in market competitiveness and user stickiness, which is also the original intention of the product.

 

In general RDS can be found in the course of the use of the is limited by the capacity in the cloud, in a RDS pool could do business also do analysis, but the user on the analysis of the MySQL performance have very high expectations, it also has led the team in the architecture itself into a new research products, increase the product performance, and to improve its market competitiveness. This is the demand of users, but also the reason for doing self-research. RDS has various problems such as performance evaluation, and there are only two ways for database product architecture: horizontal ScaleOut solution, namely HybirdDB for MySQL product architecture positioning, parallel processing, linear promotion, used to solve tB-PB HybirdDB hybrid scenario; Another category is ScaleUP solutions, which have both hot data and traditional mysql-based solutions. However, in the case of CPU can not be expanded any more, we still need to consider ScaleOut solution to transform years of hardware capabilities and practical capabilities into the next generation of database products.

 

Gtarner proposed the concept and issued the relevant HTAP market guidance manual for the definition of a database: supports both OLTP and OLAP scenarios, based on an innovative computational storage framework, supports real-time analysis while ensuring transactions on a single data, eliminating the time-consuming ETL process. In traditional IP architecture, some transaction processing needs online analysis and separate processing. The emergence of HTAP database hopes to support the operation of business system and OLAP scenario on a piece of data. As you can imagine, a database running the accounting, financial, business and other enterprise core databases, at the same time, also in the report analysis, so it can save a lot of extra overhead. However, HTAP is just the appearance of the specific business needs to see each implementation.

Product features

Source of HTAP capabilities

HybirdDB for MySQL is a native cloud database product, which itself is the source of power, unlike some traditional distribution table solutions.

 

First, at the entry level, sub-database and sub-table are not perceived by users. HybirdDB for MySQL is a unified entry, which is a relatively transparent state for users. In terms of design, from link to execution, to computing, and then to storage, the whole architecture is relatively loosely coupled, pluggable, extensible, coordinated and unified. On the whole, it is a completely distributed situation, which can achieve distributed parallelism, and write and query can be linearly promoted. In addition to giving consideration to MySQL syntax, we also developed our own computing engine. Through continuous iteration, MySQL also has the capability of parallel processing MPP, which is more favorable in the OLAP scenario.

Evolution of architecture

 

Currently, the underlying storage of HTAP users is MySQL. In actual business, the storage engine is also listed on the Self-developed MySQL for faster calculation, and the ranks and columns are mixed, so as to achieve the balance between the two sides. For users, HybirdDB for MySQL is a distributed database with enhanced performance and the same functions as MySQL. Of course, users can seamlessly transfer their applications based on MySQL and some reports based on other reporting tools to HybirdDB products.

Strong parallel processing

 

Users have TERabyte write performance when using HybirdDB product capabilities. Writing is visible to users in real time and has transaction characteristics. The product has also made corresponding optimization in writing to achieve the overall performance index. For traditional MySQL, a self-developed computing engine has also been introduced, which greatly improves the product performance. For both row and column storage engines, reducing costs and providing storage with higher compression ratios have always been the challenges that projects need to overcome. At present, for users, when part of the data falls into the product, the amount of data is also considerable. In the current sales specifications, if users select the corresponding specifications and find that the computing capacity or capacity is insufficient during operation, they can automatically initiate the expansion action. In the capacity expansion process, the system is writable, processing PB level data can be dynamically expanded, providing parallel solutions.

RDS usage experience along the bearing

 

Beyond the capabilities of the product itself, HybirdDB for MySQL is a continuation of the RDS experience and strictly adheres to the SQL standard. In addition, there are corresponding enhancements in the syntax layer, including Oracle syntax, support for analysis functions and TPC-H/TPC-DS standard, native MySQL and some mainstream development frameworks can easily access the product. In addition, HybirdDB for MySQL also inherits MySQL management and control. For users, what they see is the internal operation and maintenance system in the console and similar to RDS function classification. In terms of daily use management, there are DMS, DTS and other upstream and downstream products of MySQL ecology, supporting Alibaba’s QuickBI service on the decision-making cloud. Because of the continuity of the MySQL ecosystem, other products can be supported.

Stable and reliable service

For a database product, the stability of the product itself is a very important indicator. HybirdDB for MySQL stability is also the first priority of the design, because HTAP will have user business requirements running on it, the database itself will be relatively high stability requirements. HybirdDB for MySQL provides three aspects of protection: parallel and restore backup, multiple backup sets, one-click upload OSS, which is data-level protection; There is also service-level protection, anti-flash disconnection without perception (service guarantee is 99.95%), multiple copies of storage nodes, automatic failover, smooth expansion without shutdown; As well as IDC-level protection, HybirdDB for MySQL is also outputed in aliyun private cloud, and supports two-site and three-center deployment in private cloud environment.

Self-breakthrough, rapid evolution

All the functions mentioned above are the functions of HybirdDB itself. Since it is a self-developed product, there are many kernel functions input, and it is necessary to continuously iterate the product. The project will not lock the database table, there will be flexible processing, the cost will be smaller than the traditional scheme. RDS is already supported, and HybirdDB will soon support sequential operations and distributed transactions at the end of October. In terms of sales, there is now a new sales area in Apsara StackV3 proprietary cloud. In terms of specs, a new spec will be released shortly after THE Cloud-based conference to enhance OLAP across the board (this spec is currently open for testing). Compared to MySQL, there are actually high requirements from users, which is also the power of the product’s continuous iteration.

Scenarios and Schemes

In the scenario, HybirdDB for MySQL itself is positioned for real-time visibility, mixed analysis, and activation of the entire enterprise data. HybirdDB for MySQL has been widely used in big data analysis, massive history library, Internet of Things, performance monitoring and other business fields since its commercialization in March 2017. Currently there are more than PB of data per day managed and analyzed in real time through HybridDB for MySQL.

 

Several application scenarios are listed here, such as high-throughput real-time DML operations, transaction visibility requirements, and high-concurrency scenarios. Typical applications include real-time indicator monitoring and real-time recommendation. For example, when there are real-time write, low-cost storage, response time blocks, low concurrency, and data cannot be pre-modeled, typical applications include business index and transaction history library; There are also data centralization, real-time processing and analysis, and transaction visibility requirements, typical HTAP scenario, mixed data warehouse scenario and so on.

A typical scenario

The following figure shows a typical HybirdDB for MySQL application scenario. HybirdDB for MySQL is a continuous evolution within RDS. When used within RDS, HybirdDB distributes user’s source data or stream data to HybirdDB for MySQL. Real-time monitoring and analysis of RDS health status from different dimensions while internal system real-time data writing. There are multiple business pain points in the example: Anomaly points and trends need to be deeply mined based on massive monitoring data, and the quality of service of RDS needs to be visualized in multiple dimensions. Need to meet high throughput type of business and so on. The value of HybirdDB for MySQL is also obvious: structured data, streaming data, off-line data reflux, multiple data concurrent writing; And achieved the management of PB level mass data; A real-time data warehouse dock is also provided.

 

Help the rapid development of multimedia cloud services

In multimedia applications, HybirdDB for MySQL provides stable and reliable database services for the whole Ali Cloud products: intelligent image solution CloudPhoto+MTS+CDN, video class and solution LiveVideo+MTS+CDN and so on. CDN, live video, video transcoding, and a lot of multimedia product data are put on HybirdDB, which provides continuous and stable database services with balanced capabilities.

 

Currently, the solution has not been sold overseas, and will be deployed overseas as the business expands.

Enterprise level solution of a government iot data solution

This case is typical, which is more inclined to the business system application of the Internet of Things. Meanwhile, users will issue some reports on it regularly. The original user data is analyzed on OTS for historical documents, and the user behavior analysis is done in the offline data warehouse. This is a typical path, and also a typical application of ali Cloud’s proprietary cloud team in the second half of the year, and the mainstream direction of the tax system. Internet of things data 34 billion, about 40TB; Unstructured PDF data, 80TB; Transaction protection is required, and update operations such as tag write back are required — in these business pain points, HybirdDB for MySQL provides a clear architecture, controllable cost, and easy operation and maintenance solution.

 

Enterprise – class solution ODS data warehouse fusion solution on the cloud

 

The DBA and big data departments are inherently two separate departments, and when big data departments have analytics needs, dbAs often have mainstream backlogs. Is it possible to do T+0 real-time reporting through some new schemes? This is also a typical scenario. The business pain points in the example are difficult to do T+0 analysis and need data handling. There is gray area in DBA and data analysis. Based on the analysis of read-only instance data, the development complexity is large, the number of nodes is large, and the o&M cost increases sharply, and the additional o&M cost increases. The case has realized many aspects of value: HTAP ability to quickly build ODS+DW layer; T+0 analysis support, ODS+ off-line reflux results; Reduce development and operation costs; Clear department responsibilities and eliminate gray areas; The ecology is mature and stable.

Development and Outlook

It’s impossible to create a database that will eat all the scenarios. Based on the HTAP scenario, finding a balance is one of the core scenarios, especially the core capabilities of the database itself, including the core capabilities of cloud products, putting the local location first. At the same time, it provides users with more improvements in cost, performance and efficiency, and gradually moves towards autonomy. Generally speaking, it is difficult to make database products into general products, so it is necessary to take the demand layer and users as the main demand driving force, and take into account the operation of some competing products to respond quickly to release the bonus of the new generation of hardware and provide better operation and maintenance experience for users on the cloud.

Finally, HybirdDB is targeted to solve some business processes while providing more analytical performance. Of course, the team is also making continuous iteration and efforts towards this goal, and also hopes to provide users with better database services. On the 18th anniversary, Teacher Ma once said that the national feelings and social responsibility, basic software is also a very proud thing, for the country’s independent and controllable, safe and reliable, contributing to the cause of network information; For the society to provide more powerful data management ability, universal benefits of technology dividend.