introduce

Flink1.13 was released in April. I participated in the Meetup of Flink1.13 and got a lot of results. From a big point of view, I improved and optimized FlingSql, optimized resource scheduling and management, and optimized DataStream API when Flink was running in streambatch. The other is the optimization of the State Backend module. This article is not only a note made at that time, but also a supplement made by referring to the official website.

One of Flink’s main goals is to make flow processing applications as easy and natural to use as normal applications. Flink 1.13’s new passive scale-up makes the scale-up of flow jobs as simple as other applications, requiring users to modify the parallelism.

This release also includes a number of important changes to give users a better understanding of the effectiveness of streaming jobs. These changes enable users to better analyze the causes when the performance of the stream job is not as expected. These changes include load and backpressure visualization for identifying bottleneck nodes, CPU flame maps for analyzing operand hotspot code, and State access performance metrics for analyzing State Backend

Read Flink SQL 1.13 in depth

In version 1.13, Flink SQL brings a number of new features and enhancements around Winddow TVF, time zone support, DataStream & Table API interaction, hive compatibility, SQL Client improves in five areas

flip-145 window tvf
- Complete relational algebraic expression
- The input is a relation and the output is a relation
- Each relationship corresponds to a data set
- Cumulater window eg: Uv statistics every 10 minutes, the results are accurate, there is no jump
- Window performance optimization
  - Memory, slicing, operator, late data
  - Benchmark tests 2x enhancement
- Multi-dimensional data analysis: Grouping sets, Rollup, Cube, etc
Flip -162 Time zone analysis
- Time zone issues: ProcTime does not consider time zones, timestamp does not consider time zones, various current_time, now does not consider time zones
- Time function: current_TIMESTAMP returns UTC +0
- Support tiestamp — LTZ timestamp vs timestamp_LTz
- Correct the proctime() function
- Daylight saving time support – same as timestamp_LTz
Flip -163 improves SQL-client and Hive compatibility
- More practical configurations are supported
- Support the statement set
Flip -136 enhanced datastrem and table conversion
- Supports event Time and Watermark for DS and TABLE conversions
- Changelog data flows can be converted between tables and datastream

Flink 1.13: Towards Scalable Cloud Native Applications

Flink 1.13 added passive resource management mode and adaptive scheduling mode, with flexible scaling ability. Combined with automatic scaling technology of cloud native, Flink can better play the advantages of elastic computing resources in the cloud environment, which is another important milestone for Flink to fully embrace the ecosystem of cloud native technology. This issue will focus on the passive resource management, adaptive scheduling, custom container templates and other new features in Flink 1.13. I think this extension is a particularly important feature in Flink 1.13

Cloud native era Flink, K8S, declaration API, elastic expansion
K8s high availability – (ZK, K8S optional)
And Rescale (reactive mode and the adaptive mdoe autoscaling mode (TBD, haven’t support)) ci.apache.org/projects/fl…
Flip-158 Generalized Incremental Checkpoints make checkpoints faster
Pod Template Supports custom Pod templates
Fine- Fine grained resource management – Featrue approximately 1.14 support
Expand resources vertically and horizontally, TM CPU → K8S, MEM → NO

Flink runtime optimized with DataStream API for streaming batch integration

In 1.13, aiming at the integration of stream and batch, Flink optimized the performance of large-scale job scheduling and network Shuffle in batch execution mode, thus further improving the execution performance of stream and batch jobs. In the DataStream API, Flink is also refining the exit semantics for finite flow jobs to further improve consistency between semantics and results under different execution modes

API shuffle architecture implementation

Finite work and infinite work, as expected
Optimize consumerVetexGroup partitionGroup for large scale jobs
Final-stream job end consistency, 2PC 😁😁
Stream batch – Data flow back
Piplien and Block — Caching is primarily handled offline

Optimization and production practices of State Backend Flink-1.13

Unified SavePoint can switch to Rocksdb
State-backend Memory control,
checkpoint save point zhuanlan.zhihu.com/p/79526638
Faster Checkpoint & Falover

Flink1.14 outlook

Remove legacy planner
Improve the window TVF
Improve the schema handing
Enhance the CDC

reference

More can view ci.apache.org/projects/fl Flink website…

Reference: tw511.com/a/01/34869…. Check your profile for more.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Big Data Development -Flink- new features in 1.13

introduce

Read Flink SQL 1.13 in depth

Flink 1.13: Towards Scalable Cloud Native Applications

Flink runtime optimized with DataStream API for streaming batch integration

Optimization and production practices of State Backend Flink-1.13

Flink1.14 outlook

reference

Big Data Development -Flink- new features in 1.13

introduce

Read Flink SQL 1.13 in depth

Flink 1.13: Towards Scalable Cloud Native Applications

Flink runtime optimized with DataStream API for streaming batch integration

Optimization and production practices of State Backend Flink-1.13

Flink1.14 outlook

reference

Related Posts

How do I access Express static resources

Print the JSON RequestBody in okhttp3

Architecture practices for small and medium r&d teams: Jenkins, an efficient, low-risk, one-click release and test continuous integration tool