In 2020, Flink was in a hot state in the whole field of big data. It not only merged most of the features of Alibaba blink into the community version of Flink, making Flink unique in the field of streaming computing, and other real-time computing frameworks can only match it. At present, Flink has no other competitors in the field of real-time, and the new version of Flink is perfectly compatible with Hive, which makes Flink fast in offline computing, fast catching up, perfect batch unification, and even many people call 2020 as the first year of Flink.

Learn the ideas of the framework

  • Learn the latest flink version 1.12, learn the latest flink features.

  • Developed in the Java language, with scala in mind

  • Theory combined with practice, in-depth source code, not only to fix the programming, but also know why

  • Be able to solve practical problems in work based on problems encountered in production

  • Combine the interview questions to nail the interview

First, what is flink

Ii. Characteristics of Flink

  • Batch flow unified

  • Support high throughput, low latency, high performance streaming processing

  • Supports window operations with event times

  • Support for stateful exact-once semantics (process and process once)

  • Supports highly flexible window operations based on time,count, and session Windows (Spark’s window is slightly weak)

  • Support continuous flow model with backpressure function

  • Support fault tolerance based on lightweight distributed snapshots (fast synchronization, security, and quick recovery when exceptions occur)

  • Support for iterative computation

  • Flink implements its own memory management within the JVM

  • Support for automatic program optimization: Avoid costly operations such as shuffle and sorting in certain cases, and cache intermediate results

Experience summary:

The version used in the company is usually not the latest version, so you need to learn the old API as well to be comfortable in the company.

General learning a framework, the framework uses what language development, with which language development applications, convenient learning at ordinary times in-depth study of the source code

Sparkstreming is a little bit weak, a lot weaker, not a perfect solution, we usually put the intermediate results in Redis,mysql, this is very troublesome, we have to interact with the outside database system, in order to save the intermediate results, to ensure that the intermediate results do not lose, And flink a very important feature is the state with the calculation of state management to studying some complex, the source of the data in some into the message queue, and some into database, and flink dragging it operations, snapshot operation can be used in the process of distributed store in the middle of the state, can undertake fault tolerance, Charge handling: Event-driven; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge; charge handling in charge.

Sparkstreaming has a lot of features that can be used for all of the streaming scenarios to ensure data consistency. The original sparkStreaming data is exactly once, which is a bit more complicated to use. For example, in SparkStreming, we have calculated the results of the data, And offset the drive end collection, and then put it in the same transaction is written to a database that supports transactions, but it has limitations, the limitations of it is, must be the polymerization of the class, otherwise, the data collection to drive side, there are may be out of memory, or loss of data, if the dispute of polymerization, Also required inhaled to support coverage, support idempotence database, such as hbase, but it is limited, it support for Windows and support for the state, is not so good, flink all solve these problems, as well as a layered API, flexible deployment, create a restore point, low latency, high throughput, memory operations, So far, in the real time space, Flink has done the best.

For finite data stream and infinite data stream stateful calculation, efficient state management.