Calling technology: the evolution of real-time data storehouse based on Flink + Hologres

Summary: This article will describe how Shared Charge Pad Pioneer Calling Technology builds a unified data service accelerated real-time data warehouse based on Flink + Hologres.

Author: Chen Jianxin, Development Engineer of Calling Technology Data Warehouse, currently focuses on the integration of offline and real-time architecture of Calling Technology Big Data Platform.

Shenzhen Call Technology Co., Ltd. (hereinafter referred to as “Call Technology”) is a pioneer in the shared charge bank industry, with its main business covering self-service charge bank rental, development of customized shopping mall navigation machine, advertising display equipment, advertising communication and other services. Calling Technology has a three-dimensional product line in the industry, large, medium and small cabinet and desktop type. At present, more than 90% of the cities in the country have achieved business services, and more than 200 million registered users, realizing the needs of users in all scenarios.

I. Introduction to big data platform

1. Development history

The development history of the big data platform of telephone technology is mainly divided into the following three stages:

1) Discrete 0

Why discrete? There was no single big data platform to support data services. Instead, each business development line took its own numbers and did some calculations, and used a low-profile version of the Greenplum offline service to maintain its daily data needs.

2) Offline 1.0 EMR

Later, the architecture was upgraded to offline 1.0 EMR, where EMR refers to the flexible distributed mixed cluster service composed of big data of AliCloud, including Hadoop, HivesPark offline computing and other common components.

Aliyun EMR mainly solves our three pain points:

One is the horizontal extensibility of stored computing resources;
Second, the development and maintenance problems caused by the heterogeneous data of the previous business lines are solved, and the warehouse is cleaned and stored by the platform uniformly.
Third, we can establish our own storehouse stratification system, divide a topic domain, and lay a good foundation for our index system.

3) Real-time, unified 2.0 Flink + Hologres

The current experience of “Flink + Hologres” real-time data storehouse is also the core of this article. It has brought two qualitative changes to our big data platform, one is real-time computing, the other is unified data services. Based on these two points, we accelerate knowledge data exploration and promote the rapid development of the business.

2. Platform capability

In summary, the 2.0 version of the big data platform provides the following capabilities:

Data integration

The platform now supports real-time or offline integration of business databases or logs of business data.

Data development

The platform now supports Spark based offline computing and Flink based real-time computing.

Data services

Data services mainly consist of two parts:

Part of it is the analytics services and ad-hoc analytics capabilities provided by Impala;
The other part is the interactive analysis capability of business data that Hologres provides.

Data applications

At the same time, the platform can be directly connected with common BI tools, and the business system can also be quickly integrated and connected.

3. Accomplish

The capabilities provided by big data platforms have brought us many achievements, which can be summarized as the following five points:

Horizontal scaling

At the heart of a big data platform is a distributed architecture that allows us to scale storage or computing resources horizontally and cheaply.

Resource sharing

You can consolidate resources available to all servers. The previous architecture was that each business department maintained a set of clusters by itself, which caused some waste, was difficult to guarantee reliability, and had high freight cost. Now it was arranged by the platform uniformly.

Data sharing

It integrates all business data of business departments and other heterogeneous data sources such as business logs, and is cleaned and connected by the platform in a unified way.

Shared services

After data sharing, the platform will uniformly output services to the outside world. Each line of business can quickly obtain data support provided by the platform without repeated development by itself.

security

The platform provides a unified security authentication and other authorization mechanism, which can achieve different degrees of fine-grained authorization for different people to ensure data security.

Second, the demand of enterprise business for data

With the rapid development of the business, it is urgent to build a unified real-time data warehouse. The requirements for building the 2.x data platform are mainly concentrated in the following aspects based on the platform architecture of 0.x and 1.0 versions, the current development of the business and the judgment of the future trend:

Real-time screen

Real-time large screen needs to replace the old quasi-real-time large screen with a more reliable, low-latency technical solution.

Unified Data Service

High performance, high concurrency and high availability of data services have become the key to the enterprise digital transformation of unified data portal, it is necessary to build a unified data portal, unified external output.

Number of real-time warehouse

Data timeliness is becoming more and more important in enterprise operation, which requires faster and more timely response.

III. Real-time data storehouse and unified data service technical scheme

1. Overall technical architecture

The technical architecture is mainly divided into four parts, namely data ETL, real-time data warehouse, offline data warehouse and data application.

Data ETL is the real-time processing of business database and business log, and the unified use of Flink real-time calculation.
After real-time processing, the data in the real-time data warehouse will be stored and analyzed in Hologres
Business cold data is stored in Hive offline warehouse and synchronized to Hologres for further data analysis and processing
The common BI tools, such as Tableau, Quick BI, Datav and business system, are unified and connected by Hologres.

2. Real-time data warehouse data model

As shown above, the real-time database has some similarities to the offline database, except with fewer links at the other layers.

The first layer is the original data layer. There are two types of data sources. One is the Binlog of the business library, and the second is the business log of the server, which uniformly uses Kafka as the storage medium.
The second layer is the data detail layer, which extracts the information in the original data layer Kafka through ETL and stores it to Kafka as the real-time detail. The purpose of this is to make it easier for different downstream consumers to subscribe at the same time, and to facilitate the use of the subsequent application layer. The dimension table data is also stored through Hologres to satisfy the following data association or conditional filtering.
The third layer is the data application layer. In addition to Hologres, Hologres is also used to connect with Hive, and the upper application services are provided by Hologres uniformly.

3. Overall technical architecture data flow

The following data flow diagram can concretely deepen the planning of the overall architecture and the overall data flow of the warehouse model.

As can be seen from the figure, it is mainly divided into three modules:

The first is integration processing;
The second is the real-time data storehouse;
The third piece is data application.

From the inflow and outflow of data, we can see two main core points:

The first core is Flink’s real-time computation: it can be fetched from Kafka, either read MySQL Binlog data directly from the Flink CDT, or written directly back to the Kafka cluster, which is a core.
The second core is the unified data service: now the unified data service is done by the Hologres, to avoid the problem of data islands, or difficult to maintain consistency, etc., and also to speed up the analysis of offline data.

Four, concrete practical details

1. Big data technology selection

Solution execution is divided into two parts: real-time and service analysis. In terms of real-time, we chose Aliyun Flink as the full hosting method, which mainly has the following advantages:

State management and fault tolerance mechanism;
Table API and Flink SQL support;
High throughput and low delay;
Exactly Once semantic support;
Flow batch integration;
Full hosting and other value-added services.

In terms of service analysis, we chose Aliyun Hologres interactive analysis, which has brought several benefits:

Top speed response analysis;
High concurrent read and write;
Separation of computing storage;
Simple and easy to use.

2. Real-time large-screen business practice

The figure above shows the contrast between the old and new solutions of the business real-time large screen.

In order, for example, the old scheme of orders from the orders from the library, through the DTS synchronization to another, although it is real-time, but in computing and deal with this aspect, mainly through regular tasks, such as scheduling interval to 1 minute or five minutes to complete the data updated in real time, the sales floor, management needs to be more real-time dynamic grasp of business, , so it’s not really real time. In addition, slow and unstable response is also a big problem.

The new solution is based on the Flink real-time computing + Hologres architecture.

The development mode can fully use the SQL support of Flink, for our previous MySQL computing development mode, can be said to be a seamless migration, to achieve rapid landing. Data analysis and services are unified using Hologres. Take the order again as an example, such as the revenue of today’s order, the number of users of today’s order or the number of users of today’s order. With the increase of business diversity, the city dimension may need to be added. Through the analysis ability of Hologres, it can perfectly support some indicators of revenue, order quantity, order user number and city dimension for rapid display.

3. Implementation of real-time data storehouse and unified data service

Take a piece of business scenario as an example, such as a business log of relatively large magnitude, and the average daily data volume is at the TB level. Let’s first analyze the pain points of the old plan:

Poor timeliness of data: Due to the large amount of data, the strategy of hourly off-line scheduling was used in the old scheme for data calculation. However, this scheme has poor timeliness and cannot meet the real-time requirements of many business products. For example, the hardware system needs to know the current state of the equipment in real time, such as alarm, error, empty position, etc., and make corresponding decisions and actions in time.
Data islands: The old solution used Tableau to interface with a large number of business reports that analyzed how many devices were reported in the past hour or day, and which devices reported abnormalities, etc. For different scenarios, data previously calculated offline via Spark will be backed up and stored in MySQL or Redis. In this way, multiple systems form data islands, and these data islands are a huge challenge to platform maintenance.

The business log can now be transformed with the 2.0 Flink+Hologres architecture.

The previous terabyte log volume was completely stress-free under the Flink Polymer’s low-latency computing framework. For example, the previous link from Flume HDFS to Spark was scrapped and replaced with Flink, where we only had to maintain a Flink computing framework.
When collecting the device status data, it is all unstructured data, which needs to be cleaned and then returned to Kafka, because the consumers may be diversified, so that it is convenient for multiple downstream consumers to subscribe at the same time.
In the scene just above, the hardware system needs high concurrency and real-time query of the status of tens of millions of devices (charge banks), which requires high service capacity. Hologres provides high concurrent reading and writing ability, establishes primary key table associated with state devices, and updates status in real time to meet the real-time query of devices (charge bank) by CRM system.
At the same time, the detailed data of the latest hot spots will be stored in Hologres to provide external services directly.

4. Business support effect

With the new solution of Flink+Hologres, we supported three scenarios:

Real-time screen

At the business level, diversified requirements can be iterated more efficiently and development, operation and maintenance costs are reduced at the same time.

Unified Data Service

Service/analysis integration through an HSAP system to avoid data silos and issues of consistency and security.

Number of real-time warehouse

To meet the increasingly high requirements for data timeliness in enterprise operations, second level response.

5. Future planning

With the business iteration, our future planning in the big data platform mainly includes two points: the integration of stream and batch and the improvement of real-time data warehouse.

The current big data platform is generally a mix of offline architecture and real-time architecture. In the future, redundant offline code architecture will be abandoned, and Flink’s stream-batch integrated computing engine will be utilized to unify the computing engine.
In addition, we have only migrated part of our business so far, so we will refer to the previously improved offline warehouse indicator system to meet our current real-time warehouse construction and comprehensively migrate to 2.0 Flink + Hologres architecture.

Through future planning, we hope to build a more complete real-time data warehouse together with Flink full hosting and Hologres, but we also have a further demand for it here:

1. Demand for full hosting of Flink

Flink’s fully hosted SQL editor is very efficient and easy to write FlinkSQL jobs, and also provides many common SQL upstream and downstream connectors for your development needs. However, there are still some requirements that we want Flink to fully host in subsequent iterations:

SQL job version control and compatibility monitoring;
SQL jobs support Hive3.x integration;
DataStream jobs are easier to package and resource bundle uploads are faster.
Tasks deployed in the Session cluster mode support automatic tuning.

2. Demand for interactive analysis of Hologres

Hologres not only supports highly concurrent real-time writes and queries, but also is compatible with the PostgreSQL ecosystem for easy access to the Unified Data Service. But there are still some requirements that Hologres would like to support in later iterations:

Support hot upgrade operation to reduce the impact on business;
Support data table backup, support read and write separation;
Support for accelerated query AliCloud EMR-Hive database;
Support computing resource management for user groups.

This article is the original content of Aliyun, shall not be reproduced without permission.