This article is shared by Gao Yang, senior product expert of Alibaba. It mainly introduces the product features and core functions of the new generation Serverless real-time computing engine.

one Real-time computing Flink edition – Product Positioning and objectives

! [](https://pic3.zhimg.com/80/v2-ecfeb97778a573f03640d45d8cc94593_720w.png)

Firstly, introduce the real-time calculation Flink version product positioning and target. In recent years, it is obvious that the overall development trend of big data technology is “real-time”.

  • Online applications, more and more business scenarios and applications are gradually evolving into online applications, such as live broadcast and short video, which emphasize real-time.
  • Online ML, machine learning from the traditional offline machine learning to online machine learning evolution.
  • Microservitization, microservices are also very popular now to do full decoupling at the algorithmic level.
  • Real-time risk control, such as financial risk control, content security risk control and pure security risk control, is also gradually developing into real-time.
  • Real-time ETL, the extraction, filtering, aggregation of real-time data to produce results.
  • Real-time data warehouse, T+1 data report has been unable to meet the current needs of customers, need from the whole including real-time link incremental data of the unified dimension to do federated query, highlight the unified report, then derived from the real-time data warehouse.
From the development of the whole technology stack, we can see that real-time has become the inevitable trend of the development of big data technology, and Flink is also focused on real-time scenarios.

Second, Apache Flink has become the de facto standard for real-time computing in the country. At present, Aliyun real-time computing Flink version has been applied for national standards and academy standards in relevant national departments, such as real-time computing standards and fusion computing standards, including stream computing, batch computing, ML, graph computing, etc. It can be seen that many major Domestic Internet companies are using Flink or Aliyun real-time computing Flink version. With the arrival of online payment, 5G and the introduction of the Internet of vehicles, traditional financial companies and large manufacturing industries have also begun to explore the real-time introduction of big data, using Flink as the core engine of data calculation.

Third, Alibaba has been leading Flink community, actively promoting Flink technology evolution and fully investing in Flink community operation. In January 2019, Alibaba completed the acquisition of Flink’s founding team, The commercial parent company of Flink, Ververica. It can be seen that Ali Cloud, including the whole Ali Group, has started to invest a lot in the Flink community since 2019.

  • Contribute 3 million + lines of code
  • Hold Flink community Meetup and introduce Flink brand conference Flink Forward
  • The world’s largest Flink Committer/PMC team
  • Open source community facilitator
Fourth, with the current mainstream computing engines, every open source product has a commercial company behind it. Similar to Databricks and Spark and Confluent with Kafka, Alicloud Real-time Computing Flink edition is the commercial brand of Open source Flink, providing enterprises and customers with one-stop real-time computing commercialization solutions and SLA support on the cloud.

two Real-time computing Flink Edition – Product features

! [](https://pic2.zhimg.com/80/v2-cf31a78ca5185b244002fc649ab5540e_720w.png)

Next, the core product functions of real-time computing Flink version are introduced. Founded by the founding team of Apache Flink in Germany, Ververica Platform is a mature and stable commercial product that has been used and honed by enterprise customers overseas for many years. This year, it was introduced into China for commercialization. It is mainly divided into three parts:

1. Development module

  • SQL development platform: In recent years, big data development has gradually become SQL. Business analysts and business personnel can quickly get involved in the development and processing of business logic through SQL, which greatly improves efficiency and saves manpower.
  • Job lifecycle management: You can manage the entire lifecycle of a Job, from Job submission to Job termination, upload and download.
  • Graphical Metrics: The open source community Flink provides relatively few Metrics to monitor, whereas the commercial products do a lot of burying and can see very detailed Metrics.
  • Rich Connectors: Support data transformation into real-time, fully mining data assets, can do more analysis, activate business opportunities to facilitate transformation.

2. Operation and maintenance module

  • Full link monitoring alarm: for companies including banks, full link monitoring is very important. Especially in the production system, the indicator monitoring and alarm requirements of the whole link are very high, which is also a very important function of the Ververica Platform.
  • OIDC & RBAC: Permission authentication. From the perspective of Internet industry or traditional industry, traditional enterprises have strict requirements on permission control and access management in the deep cloud. OIDC & RBAC can fully match the requirements of finance, banks and insurance companies.
  • Intelligent configuration tuning: with the SQL development platform, the intelligent tuning function can automatically help customers adjust some major configuration parameters through the built-in rule engine, so that the resource allocation or resource consumption of the job can achieve the optimal cost performance. Not only can save resources, but also can efficiently complete the operation.
  • Elastic resource management: From task Manager to Job Manager, you can apply for more resources when the customer load is high. When the load is low, redundant resources can be released to improve resource utilization and save costs.

3. The performance

  • SQL engine optimization: Compared to open source Flink, the commercial version of SQL is more powerful.
  • Execution engine optimization: The professional Runtime team continuously optimizes the network and Shuffer parts.
  • Storage engine optimization: The commercial Version of the Gemini storage engine has been validated and tested in a number of benchmark customer sites, and overall Flink’s performance is three times that of open source Flink.

Base of 4.

Real-time computing Flink version can be based on the whole Ali Cloud computing platform EMR platform, can also be based on K8S container platform, including the latest billing Serverless base, based on security container isolation, more flexible scalability.

3. Real-time calculation of Flink version of the function of detailed explanation

1. The SQL integration

! [](https://pic4.zhimg.com/80/v2-d3e74074a3cda8ee926df85aee6c936b_720w.png)

Big data processing interactive interface, the current industry general consensus or tendency is SQL. SQL as a whole is simpler, the threshold is lower, data analysts and business personnel can quickly get started, and greatly improve human efficiency and development efficiency.

The green interface above is the Ververica Platform developed by the German team. The overall interface style is simple and direct, without too many complicated and redundant interactions. The Ververica Platform provides rich SQL semantic support, including full SQL semantics such as DML and DDL.

2. Manage DataStream jobs

! [](https://pic4.zhimg.com/80/v2-6b8ca824a640c5e0df8cfa8c8cadeef9_720w.png)

The Ververica Platform supports a variety of job submission methods. There are standard mode and advanced mode. During the submission process, you can flexibly select various parameters and Settings. The Ververica Platform currently supports a variety of kernels, both open source kernels (e.g., Flink V1.10, Flink V1.11, and future open source versions) and commercial kernels (e.g., Ververica Runtime). Of course, commercial kernels have more plug-in enhancements in performance and functionality to achieve perfect compatibility with customer jobs.

! [](https://pic2.zhimg.com/80/v2-dda9ca0cc8a3c725878b6f4358c6f069_720w.png)

This part is mainly about parameter configuration: resource configuration and log configuration, as well as job management.

3. Automatically tune auto-pilot

! [](https://pic2.zhimg.com/80/v2-c08193629b31443520d503e3c9d06741_720w.png)

Auto-pilot Automatically enables the auto-pilot function to adjust concurrency, CPU usage, and memory usage for SQL and DataStream jobs.

4. The UDF management

! [](https://pic2.zhimg.com/80/v2-517e6a2ff1d38e307d61844305db9030_720w.png)

UDF in combination with SQL typically implements 80% of a customer’s scenarios. Of course, customers may have other complex scenarios (e.g., custom Windows, custom connectors, etc.) that need to be complemented by code development based on DataStream API.

5. The Metrics monitoring

! [](https://pic1.zhimg.com/80/v2-b52d353db07328445cdff27259a12edf_720w.png)

Metrics is an area of concern for many customers, especially in production environments. The more sensitive online services are, the higher the monitoring requirements are. The Ververica Platform provides various dimensions of monitoring, including Overall monitoring, Checkpoint monitoring, Watermark monitoring, network monitoring, CPU monitoring, JVM monitoring, IO monitoring, and so on.

6. Abundant upstream and downstream support

! [](https://pic1.zhimg.com/80/v2-6d7feea74820946b1dcf80f2add258f8_720w.png)

Real-time computing Flink edition supports rich upstream and downstream, including Stream Message, Dimension Data, Data Storage, Data Sink. Ali cloud real-time computing Flink version is the computing link in the middle. At present, real-time computing Flink version supports the Data Source and Data Sink on the cloud as well as the open Source Data Source and Data Sink, and it is very convenient to use.

Four. Real-time Computing Flink edition – An introduction to semi-managed and fully managed services

! [](https://pic3.zhimg.com/80/v2-f4131a620ae3eba5b32280b812195b63_720w.png)

Fully managed services, semi-managed services actually as the name implies whether there is end-to-end product services. Those that contain product services are fully managed services, while those that do not contain all after-sales services or technical support services are called semi-managed services. Now two kinds of product form ali cloud real-time computing Flink version are provided.

The difference between Flink’s fully managed service and semi-managed service can be compared from five dimensions, including application scenarios, functions and features, operation and maintenance management, elastic expansion and performance efficiency. On the whole, full hosting service has lower TCO and higher cost performance, and can also enjoy the high SLA service of the original factory.

Five. Real-time Computing Flink Edition – Common Business Scenarios

! [](https://pic2.zhimg.com/80/v2-c674b78deabd89fb226743e1f61f272e_720w.png)

There are four main general business scenarios of Aliyun Real-time computing:

  • Real-time ETL & index construction, mainly through real-time computing to complete real-time data extraction, real-time data aggregation, real-time cleaning. For example, real-time monitoring platforms or real-time large-screen scenarios.
  • Real-time statistics and analysis, such as real-time warehouse scenarios.
  • Real-time machine learning. With the end of user bonus, the conversion rate effect of traditional T+1 offline recommendation engine is getting worse and worse, and the recommendation engine is also evolving to real-time. Real-time sample splicing and real-time incremental model are used to improve the conversion rate.
  • Real-time event processing, mainly real-time monitoring, risk control scenarios. For example, in the financial field, online credit and real-time financial risk control scenarios; Real-time big data security risk control scenario based on situational awareness in the security field.
The following is the current aliyun real-time computing Flink version of some typical customers and industry distribution.

! [](https://pic4.zhimg.com/80/v2-f749c4c2ffbdd6a95e31e7e94d1d14a8_720w.png)

The following describes some typical real-time computing application scenarios and cases.

1. Real-time computing Flink version – Real-time large screen scenario

! [](https://pic4.zhimg.com/80/v2-85f953e38f0bd79744cf2ca0de005a61_720w.png)

Real-time large screen is the typical scene of Aliyun real-time computing Flink version, which has been running in Alibaba Group since 2016. On November 11, 2019, the peak processing of real-time computing Flink version reached 2.5 billion messages per second, and the data throughput was 2.63TB per second. Real-time large screen data link is mainly divided into two parts, one is the user’s transaction data, generally there is a traditional relational database; The other part is behavior data or behavior logs (e.g., user browsing or clicking logs), which are typically stored in the ECS logging system. Through Kafka and CDC-like data extraction tools, data are pushed to Flink in real time for real-time data processing, aggregation and cleaning, and then the result data is stored in real time for real-time data visualization.

Real-time screen has a very wide range of applications, such as online education VIPKID screen, screen of CCTV Spring Festival gala, last year the National Day military parade on the real-time display of cloud, including 58 domestic home life, and China construction bank, minsheng bank, use links in do trade platform for the control of the whole circuit of the monitor screen, etc.

2. Real-time Computing Flink version – Real-time ETL data processing scenario introduction

! [](https://pic3.zhimg.com/80/v2-4ff7bdcc5ea00a60e8f3ad0e9ac72db0_720w.png)

The second is a live ETL scenario. For example, in the online education scenario, students’ behaviors in 1-to-1 or 1-to-multiple classes in the online classroom, and even parents’ browsing and shopping behaviors on the website, these logs are transmitted to Flink through DataHub or Kafka for real-time cleaning and aggregation. The link is then stored in a search platform such as Elasticsearch, where customers and marketers can perform some searches, or system maintenance personnel can monitor and alert the entire link.

! [](https://pic4.zhimg.com/80/v2-941853fabb8a6649be56f3b8b627b051_720w.png)

VIPKID mainly focuses on online one-to-one video courses, which may have more than 30,000 classes per hour at its peak. Last year, it started to use real-time computing Flink version, which can extract logs of different departments into real-time computing Flink through MQ queues, and then unified departments can calculate and clean data. And stores the final results for consumption by different business units.

3. Real-time Computing Flink edition – Introduction to online machine learning scenarios

! [](https://pic3.zhimg.com/80/v2-d5e0969e204c4362e681cbbeaa7bb357_720w.png)

In the application scenario of online machine learning for real-time computing, the processing link of traditional offline machine learning is shown at the bottom of the figure: through offline log, offline sample generation, offline training, and offline recommendation service. This is a traditional machine learning link. With the development of business, when users (monthly or daily) reach a certain order of magnitude, it is difficult to improve the recommendation conversion rate, so it is necessary to mine the value of the model from the time dimension. For example, in the hope of more quickly recommending some results that meet customers’ requirements, real-time online machine learning processing links need to be added.

! [](https://pic4.zhimg.com/80/v2-39606fb00d322b068ccabe4c94f25b2e_720w.png)

Take a customer with a social media head as an example. So far, online machine learning of this platform has been applied in multiple business scenarios, processing 3 billion to 10 billion pieces of data every day, and computing scenarios are complicated, such as multi-stream join and even multimedia computing. It can be seen that in the whole online machine learning process, the conversion rate effect is significantly improved by using real-time computing Flink version as the computing engine, and the online model effect is about 8% higher than the offline model effect.

4. Real-time Computing Flink version – Real-time data warehouse scenario introduction

! [](https://pic1.zhimg.com/80/v2-5128ae86d59085a3efdf685ed368cea8_720w.png)

With the continuous accumulation of offline data and real-time data, real-time data warehouse is a hot scene at present. Many Internet companies, including many traditional enterprises (such as banks and insurance companies) have the appeal of real-time counting warehouse. Customers not only want to see offline data reports and results, but also need to see real-time data written report results. How to solve the large concurrent real-time write of data warehouse, realize the architecture of stream and batch integration, column and column mixed storage and storage and computing separation, and how to provide one service unified exit based on federated query are the focus of technology evolution in the industry in recent years.

Six. Real-time computing Flink version – Real-time data processing link Demo

! [](https://pic3.zhimg.com/80/v2-383db48d4271eb9bdb09d99c16de6daf_720w.png)

The data of Internet companies is inherently “real-time”, which instinctively collects data and logs through kafka-like messaging engines and then processes them through real-time computing Flink. But traditional enterprise (e.g., car companies, manufacturers, retail enterprises), early data assets are stored in a relational database, the digital transformation, business online or in the process of real time process, how to activate this part of the huge amounts of data, this part of the so-called static data real-time, fully tap the value of the enterprise data assets, is the key.

This year we can see many traditional enterprises through the data Center project to do the transformation of data sources (in essence, to prepare for real-time). This Demo mainly shows the process and scenario of real-time data processing from data source to data extraction (activating static data) to real-time data processing (dual-stream Join and stream table Join), and then to the landing of real-time data warehouse, interactive analysis and query, and real-time data visualization. End-to-end link is used to demonstrate the process and scenario of real-time data processing in the whole link.

Seven. Serverless Fully hosted Flink – Free testing

! [](https://pic1.zhimg.com/80/v2-6f481aab620e3f85c805263a02c272f1_720w.png)

Currently Flink is doing a free test based on full hosting, everyone can go to the public test address for free application. The average customer has several concerns about using cloud services.

  • First, feel that there is no semi-managed services, no bottom, no guarantee, fully managed services can be solved.
  • Second, although the full hosting service solves the after-sales problem, the price may be too expensive sometimes.
Using the latest technology of Serverless, charging by volume and elastic expansion mode, we can not only ensure customers’ demand for cost performance, but also ensure customers’ demand for the bottom. Hopefully there will be more customers and more interested developers to try it out for a long time. You can experience it, find problems can also timely feedback, we will continue to improve and optimize.

Yang Gao is a senior product expert at Alibaba

The original [link] (https://developer.aliyun.com/article/769715?utm_content=g_1000172111)

This article is the original content of Aliyun and shall not be reproduced without permission.