Brief introduction: Alibaba Cloud RealTime Compute for Apache Flink Powered by Ververica is an enterprise-class, high-performance real-time big data processing system built by Ali Cloud based on Apache Flink. It is officially produced by the founding team of Apache Flink. It has a global unified commercial brand and is fully compatible with open source Flink API. Provide a wealth of enterprise-level value-added functions.

The paper sorting from live the real-time computing Flink edition general introduction video link: https://developer.aliyun.com/…

Apache Flink technology development

The rapid development of big data has been more than 10 years, and big data is also evolving from computing scale to more real-time.

For example, in the “Double 11”, the shopaholic link held by Alibaba, the real-time transaction volume and transaction amount can be displayed through the real-time large screen, and the update can be realized at the millisecond level. The CCTV Spring Festival Gala, which will be watched by Chinese people all over the world, can be used for real-time statistics of the national audience rating and audience portraits through the large screen of the Spring Festival Gala. At present, the urban brain project exists in many cities. Through the IOT camera information, it can capture the traffic, vehicles, people flow and other information in each city in real time for traffic monitoring and management. In the financial industry, in the core business scenarios of banks, stock exchanges and other institutions, the real-time computing ability of big data is also used to monitor trading behaviors in real time and detect anti-cheating, anti-money laundering and other behaviors. In addition, in the whole Taobao e-commerce transaction scene, personalized recommendation is made in real time according to the user’s behavior. Based on the user’s browsing of goods within one minute or 30 seconds, the system will calculate the user portrait according to the algorithm in the subsequent browsing, and then recommend relevant goods that the user may like in real time. It’s safe to say that so many of the scenarios involved in everyday life are driven by real-time computing that drives productivity, day and night.

Real-time computing requires an extremely powerful set of big data computing capabilities in the background. Apache Flink, as an open source real-time computing technology for big data, emerges as The Times require. It is started by stream computing from the beginning of design, because traditional computing engines such as Hadoop and Spark are batch computing engines in nature, and the processing delay cannot be guaranteed through data processing on limited data sets. As a streaming computing engine, Apache Flink can subscribe real-time realistic data, analyze and process the data in real time and produce results, so that the data can play its value in the first time.

At present, Apache Flink also gradually has the streaming and batch-integrated computing capability from the stream computing engine. It can carry out streaming analysis and processing through log stream, clickstream, IOT data stream, etc. At the same time, it can also carry out batch data processing on the limited data sets such as files in the database and file system to quickly analyze the results. Apache Flink is now a very popular open source big data technology in the open source community, and has been one of the most active Apache open source projects in the world for three consecutive years. It has strong consistency of computing power, large-scale expansibility, the overall performance is very excellent, at the same time support SQL, Java, Python and other languages, has a rich API interface to facilitate the use of various scenarios business. At present, Flink has become the mainstream real-time big data computing technology among Internet enterprises at home and abroad, and is the de facto technical standard in the field of real-time computing.

AliCloud real-time computing Flink version of products, after years of training and verification in Alibaba Group, has accumulated a wealth of technologies and products, now has been provided to the cloud, to provide cloud computing services for small and medium-sized enterprises in all walks of life. Back in 2016, just three years after Apache Flink was donated to Apache, Alibaba was already rolling out real-time computing products on a large scale. This product was first launched in the most core search and recommendation and advertising business scene of Ali. In this scene, we need a lot of real-time data processing, such as real-time recommendation, real-time ranking, real-time advertising, etc., which has greatly improved the core business of the whole e-commerce.

In 2017, the real-time computing platform products based on Flink began to serve the whole Alibaba Group. In the same year, Singles Day served the real-time data of the whole Group, including the core big screen of Singles Day. In 2018, the products were officially launched into the cloud, which not only served the group, but also served the small and medium-sized enterprises on the cloud. It was also the first time that the real-time computing products of Flink were provided to the public in the form of public cloud.

At the beginning of 2019, Alibaba acquired the founding company of Flink, Ververica. Alibaba’s Flink technology team — real-time computing technology team — and the founding team of Flink based in Germany joined together smoothly, becoming the strongest Flink technology team in the world. It also promotes the development and contribution of the entire Apache Flink open source community. At present, the Apache Flink community in China has more than 20 million developers participating in the community, making Flink one of the most active projects in the Apache Foundation’s big data field.

Last year, mainstream cloud computing companies and big data companies around the world launched their own Flink products using Flink technology in large numbers. For example, Cloudera, which started with Hadoop, also launched CDP/CDH integrated with Flink. Domestic big data companies also launched real-time computing products based on Flink one after another.

Real-time calculation of the Flink product architecture

Compared with the open source version, the real-time computing product architecture of Ali Cloud has been greatly improved and added value. Now many developers will use open source Apache Flink to build their own real-time computing platform when they build their own computer rooms or virtual machines on the cloud. So what are the features of the real-time computing Flink product launched by Aliyun?

According to the architecture diagram of the whole product, the bottom layer is the perfect cloud-native infrastructure based on Ali Cloud, and a set of real-time computing Flink products is built through containerization. All the computing tasks of Flink are run on the Kubernetes ecology, and multi-tenant isolation is carried out in a containerized way to ensure security. At the same time, it is a fully managed service form, providing fully managed service with high SLA guarantee on the cloud, and eliminating the trouble of users’ operation and maintenance. With the service architecture, users can judge the proportion of various resources more flexibly, and choose according to their own business volume, without worrying about the planning of the machine. The real-time computing Flink product is a natural cloud-native infrastructure.

In terms of the core computing engine, compared with the open source Apache Flink, Ali Cloud has optimized many core functions, and these optimizations have also been tempered through the internal business of Ali. At present, the real-time computing Flink product supports the real-time data services of nearly 100 business divisions of Ali Group. Through a lot of business practice, the product in support of storage, scheduling, network transmission and other aspects, are debugged to the best effect.

In terms of plug-ins, the product has dozens of enhanced connectors, which can connect to all mainstream open source data storage, including MySQL, HBase, HDFS, AliCloud SLS, etc., natural integration, out of the box. In terms of development platform, it provides an enterprise-level one-stop development platform with its own development, operation and maintenance capabilities to avoid self-construction troubles and improve the overall experience of enterprise users.

Real-time computing Flink version supports SQL, Java, Python and other multi-language development environment, provides the full life cycle management of development tasks, can support OIDC and RBAC based enterprise-level security mechanism, and has a full link monitoring alarm based on Prometheus protocol. At the same time, the intelligent tuning system with its own Autopilot is provided to help users to intelligently tune the parameters of Flink tasks, including the tuning of resources and concurrency. The product can fully adapt to the traffic of the business, without any manual debugging (intelligent tuning is the core advantage of real-time computing Flink version of the product).

Real-time Compute Flink version differs from the open source Apache Flink

Real-time computing of the Flink version of the product compared to open source products, has a number of performance advantages, through development, operation and maintenance, cost, security and other aspects of the comparison.

The development aspect has the rich data connection ability and the one-stop multi-language development environment, the built-in many kinds of function library, convenient for the user to carry on the code debugging, but also can carry on the multi-tenant development, the task debugging, the test simulation and so on. In terms of operation and maintenance, full-link monitoring and alarm are supported, and automatic alarm can be carried out in case of data delay, data abnormality and service interruption in the process of use.

Intelligent operation and maintenance supports automatic intelligent diagnosis and tuning. It can automatically help users tune performance, job, parameter and resource according to business traffic, and can diagnose and optimize problems. At the resource level, on the basis of open source, more fine-grained and refined resource allocation can be achieved, so that each operation and each operator can be configured on the basis of CPU and memory granularity, greatly optimize the utilization of resources, help users save costs, improve the stability of services, and reduce the probability of OM. With the operation and maintenance of the original factory, the guarantee of 99.9% SLA, the fault-tolerant ability of the whole link, and the guarantee of system stability, fully solve the worries of users.

At the cost level, through cloud cost optimization, the overall TCO of users can be reduced while performance is improved, which is also the advantage of core performance.

In the standard test of stream computing based on Nexmark, the product performance of real-time computing Flink version is about 3 times that of open source. Based on the practice optimization accumulated by Ali Group’s powerful R&D team in internal core business scenarios, the product highlights the core advantages while reducing the basic cost for users.

Real-time computing Flink also has cloud-native elastic capacity expansion, which can help users reasonably save resources and improve resource utilization. Product payment type support package annual monthly payment, but also support according to the amount of payment, better adapt to different needs.

The security layer improves the user experience through containerized task isolation, and supports tenant isolation, security isolation, VPC isolation and other requirements. At the same time, through direct access to Ali account system, users can seamless security control between products based on Ali cloud account, but also support such open identity authentication protocols as role-based and OIDC, greatly improving the security of business.

Overall, the enterprise version is more functional and stable than the open source version. In addition to the advantages of operation and maintenance, out-of-the-box use also makes it more convenient for users.

Product solutions

As a streaming engine for real-time computing, Flink can process a wide range of real-time data, from ECS online service logs to sensor data in IoT scenarios. At the same time, you can subscribe to the cloud database RDS, POLARDB and other relational database binlog updates. Then, real-time data are subscribed through DataHub data bus products, SLS logging service, open source Kafka message queue products, etc., and included into real-time computing products for real-time data analysis and processing. Finally, the analysis results will be written into different data services, such as MaxCompute, MaxCompute-Hologres interactive analysis, PAI machine learning, Elasticsearch and other products, according to the business needs to choose the best data service products, improve data utilization.

The main application scenario of Flink is to subscribe, process and analyze data from various real-time data sources in real time, and write the results into other online storage, so that users can directly produce and use them. The whole system has the characteristics of fast speed, accurate data, cloud native architecture and intelligence. It is a very competitive enterprise-level product. The products run on Aliyun container service ECS and other IaaS systems, which are naturally connected with various systems of Aliyun, so as to facilitate customers to apply more scenarios.

Product application scenarios

Based on real-time computing products Flink version summed up four application scenarios, facilitate users to easily build their own business real-time computing solutions according to their needs.

1. Real-time data storehouse

Real-time data storehouse is mainly used in various transaction data scenarios such as website PV/UV statistics, commodity sales statistics and transaction data statistics. By subscribing to the real-time data source of the business, the information is analyzed in real time at the second level and finally presented on the big screen for decision makers to use, so that it is convenient to judge the business status and promotion situation of the enterprise. Make decisions based on real-time business operation data and achieve true data intelligence. Due to the particularity of the scene, real-time data is particularly important. In the ever-changing business interaction, it is necessary to analyze and make decisions on the data occurring in the last minute or even the last second. Real-time computing is the best choice in this kind of scene.

2. Real-time recommendation

Real-time recommendation is mainly personalized recommendation based on user preferences or recommendation based on AI technology, which is a mainstream product form. It is commonly seen in short video scenes, e-commerce shopping scenes, content information scenes, etc., to judge user preferences in real time based on previous user clicks, so as to make targeted recommendations and increase user stickiness. This kind of scene is very real-time, and the operation of real-time recommendation scene can be carried out through Flink technology combined with AI technology.

3. ETL scenario

Real-time ETL scenarios are common in data synchronization operations, during which data computation is performed. For example, the synchronization of different tables in the database, transformation, synchronization of different databases, or data aggregation preprocessing and other operations. Finally, the results will be written into the data warehouse/data lake for archiving and precipitation, so as to prepare for the subsequent in-depth analysis and facilitate users to conduct subsequent log class analysis and other operations. In the whole data synchronization and processing link, based on Flink to do this kind of real-time data synchronization and processing is very efficient.

4. Real-time monitoring

Real-time monitoring is common in financial or trading business scenarios. According to the uniqueness of the industry, it is necessary to have commercial anti-cheating supervision. According to the real-time behavior within a short period of time, it is necessary to determine whether the user is a cheating user and stop the loss in time. This scenario requires high timeliness. By detecting abnormal data, abnormal situations can be found in real time and a stop-loss behavior can be made. Collection of indicators or log statistics of various system indicators, real-time observation and monitoring of indicators and other demand scenarios can be solved by real-time calculation of Flink products.

This article is the original content of Aliyun, shall not be reproduced without permission.