Why can Apache Flink become a new generation of big data computing engine?

As we all know, Apache Flink (hereinafter referred to as Flink) was first born in Europe and was donated to the Apache Foundation by its founding team in 2014. Like other start-ups, it’s fresh, it’s open source, and it ADAPTS to the speed and flexibility that a fast-moving world values.

The era of big data presents new challenges to human data control ability. The birth of Flink provides unprecedented space and potential for enterprise users to obtain more rapid and accurate computing ability. As a recognized new generation of big data computing engine, why has Flink become the first choice for the construction of streaming computing platforms by well-known companies at home and abroad, such as Alibaba, Tencent, Didi, Meituan, Bytedance, Netflix and Lyft?

Listen to the core contributors at Flink! November 28-30, Flink Forward Asia 2019, Apache Flink core contributors and industry veteran experts will unlock the unique technology charm of Flink.

Flink SQL, Runtime, Hive and other technical questions about Flink will be invited to Ask Me Anything.

Flink’s dad Stephan will also be there. Raise your hand if you’re wondering why Flink’s logo is a squirrel

Xingcan Cui, Apache Flink Committer, Postdoctoral fellow, York University

Apache Flink is a next-generation stream processing engine that has been widely used in many real-time job scenarios. We found that after several iterations, it has some potential to become an integrated data processing platform for both dynamic and static data processing, distributed and centralized computing, and supporting both activity-based and interactive tasks.

In this presentation, we aim to show you some exploratory attempts to use Apache Flink as an all-in-one back-end platform for a common data processing process. In particular, we will first introduce this general data processing process and briefly describe the characteristics of each stage. We then explain in detail how Flink can be “shaped” to meet diverse data processing needs without getting to the heart of it. Part of the explanation of how Flink works will also be covered during this period. Finally, with the goal of making Flink a truly integrated data processing platform, we’ll look into the future.

Zhang Shaoquan, Tencent Senior Engineer

Drift computing SuperSQL is a cross-data center, cross-cluster, cross-data source high-performance SQL engine developed by Tencent Big data, which meets the needs of data federation analysis/instant query for different types of data sources located in different data centers/clusters. Solve the problem of data silos in big data, reduce barriers to data use, improve data use efficiency and maximize data value.

In this talk, we will cover the details of the Drift computing SuperSQL project, including:

Background and positioning of drift calculations
The main technical challenges of drift computing
The overall architecture of drift computing
Technical details of drift calculation
Performance of drift calculation
The future planning

Qin Jiangjie, Apache Flink PMC, Apache Kafka PMC, Alibaba Senior technical expert

Flink already has a rich connector ecosystem, but creating a production-ready connector for Flink still requires consideration of a number of issues, including multi-concurrent collaboration, consistency semantics, thread models, and fault tolerance. Source is more complicated than Sink. To make it easier for users to implement high-quality connectors, the Flink community has introduced a new Flink Source API in the Flip-27 that aims to help users solve these complex problems and quickly write a high-quality connector. This presentation will describe the design of the new Flink Source API and how to quickly create a production-ready Flink Source Connector using the new Source Connector API.

Chong Wu, Apache Flink Committer, Technical expert of Alibaba

Jinsong Li, Apache Beam Committer, Alibaba Technology Specialist

As the core module of Apache Flink, Flink SQL has gained more and more users’ attention, and with its easy-to-use API and high-performance SQL engine, it plays an increasingly important role in production practice.

This talk will focus on the technical details and tuning experience of the core functions of Flink SQL from the perspective of stream processing and batch processing respectively. The audience will gain a deeper understanding of Flink SQL and learn how to tune Flink SQL jobs.

(November 28, afternoon)

(November 29, morning)

The conference’s organizing committee has also prepared training sessions for Flink developers who want to use deep learning. At that time, Flink experts from Alibaba and Ververica will lead developers on a day and a half of deep learning.

Apache Flink PMC led the team, super luxury lineup, Alibaba and the founding team of Flink senior technical experts as training instructors, for the developer training courses to develop a comprehensive learning system.
The courses can meet different learning needs, no matter for beginners or advanced ones. Developers can choose course content based on their own basis to accumulate and improve their technology and application ability.

The main outline of the course is as follows:

Middle Level 1: Apache Flink developer training

This course is a hands-on introduction to Apache Flink for Java and Scala developers who want to learn how to build streaming applications. The training will focus on core concepts such as distributed data flow, event time, and state. The exercise will give you a chance to see how the above concepts are represented in the API and how they can be combined to solve real-world problems.

Introduces stream computing and Apache Flink
The basics of the DataStream API
Preparation for Flink development (including exercises)
Stateful flow processing (including exercises)
Time, timer, and ProcessFunction(including exercises)
Connect multiple streams (including exercises)
Tests (including exercises)

Note: No knowledge of Apache Flink is required.

Middle level 2: Apache Flink operation and maintenance training

This course is a practical introduction to the deployment and operation of Apache Flink applications. The target audience includes developers and operations personnel responsible for deploying Flink applications and maintaining Flink clusters. The demo will highlight the core concepts involved in running Flink, as well as the main tools for deploying, upgrading, and monitoring Flink applications.

Introduces stream computing and Apache Flink
Flink in the data center
Introduction to distributed Architecture
Containerized deployment (including actual operations)
State backends and fault tolerance (including actual operations)
Upgrade and state migration (including actual operations)
Indicators (including practices)
Capacity planning

Note: Prior knowledge of Apache Flink is not required.

Middle Stage 3: TRAINING for SQL developers

Apache Flink supports SQL as a unified API for stream processing and batch processing. SQL can be used in a wide variety of scenarios, and is much easier to build and maintain than using Flink’s underlying API. In this training, you will learn how to use SQL to write Apache Flink jobs to their full potential. We’ll look at different examples of streaming SQL, including joining streaming data, dimension table association, window aggregation, maintaining materialized views, and pattern matching using the MATCH RECOGNIZE clause (a new standard in SQL 2016).

This section describes SQL on Flink
Use SQL to query dynamic tables
Join dynamic table
Pattern matching and match_recognition
Ecosystems & write external tables

Note: No prior knowledge of Apache Flink is required, but basic SQL knowledge is required.

Advanced: Apache Flink tuning and troubleshooting

Over the years, we have worked with many Flink users to learn about many of the most common challenges in moving flow computing jobs from the early PoC phase to production. In this training, we will focus on introducing these challenges and helping you eliminate them together. We will provide a useful set of troubleshooting tools and introduce best practices and tips in areas such as monitoring, watermarking, serialization, state backends, and more. In between practical sessions, participants will have the opportunity to use their newly learned knowledge to solve some of the problems presented by abnormal Flink assignments. We will also summarize common reasons for jobs not progressing or throughput not meeting expectations, or for job delays.

Time and watermark
State handling and state backends
Flink’s fault tolerance mechanism
Checkpoints and save points
DataStream API and ProcessFunction.

The training courses are small class, the number of courses is limited, the entrance will be closed for full reservation, students with relevant training needs can book as soon as possible. Details:

Please purchase A VIP package to attend the training. Purchase VIP package 1 for intermediate training, purchase VIP package 2 for advanced training.
VIP package 1 can participate in all courses of intermediate level, and VIP package 2 can participate in all courses including advanced and intermediate level training.

If you are also curious about Flink’s main exploration direction in the future, how to use Flink to push big data and computing power to the extreme, what new scenarios, new planning and best practices Flink has, come to the scene! We believe that this group of technical experts from the front line will refresh your knowledge of Apache Flink.

The original link

Why can Apache Flink become a new generation of big data computing engine?

Related Posts

Traction Education recommends five state-of-the-art agile PHP development frameworks

Real-time risk control solution based on Flink and rules engine

Proxy- Adds a layer of interception to objects