Development history of Flink

In 2008, a team at the Technical University of Berlin launched the Stratosphere project with the goal of building the next generation of big data analysis engines

On April 16, 2014, Stratosphere became an Apache Incubator project. From Stratosphere 0.6, it was officially renamed Flink and written in Java language

Flink 0.7.0, released on November 04, 2014, introduces the most important feature: Streaming API

2016-03-08, Flink 1.0.0, Scala support

In early 2019, Alibaba acquired Ververica, the company that owns Flink products, and created its own product Blink (a branch of Flink).

What is the Flink

The definition of Flink

Apache Flink is a framework and distributed processing engine for state calculation over unbounded and bounded data streams.

By definition, Apache Flink is a framework, a processing engine, that performs calculations on data streams.

We also learned that it supports two types of data flows, unbounded and bounded, and supports stateful calculations of data flows.

So what is computation with states, what is unbounded data flow, and what is bounded data flow?

What is data flow

Any type of data can form a data flow. Credit card transactions, sensor measurements, machine logs, records of user interactions on websites or mobile apps, all of these can form data streams.

What is bounded data flow

A bounded data flow defines both the beginning and end of a flow. Bounded data streams can be calculated after all the data has been ingested. The data in a bounded data stream can be sorted, so no sequential ingestion is required. Bounded data flow processing is often referred to as bounded data flow

What is unbounded data flow

An unbounded data flow defines the beginning of a flow, but not the end of the flow. This data stream produces data endlessly (at least in theory). Because the input is infinite, we can’t wait for all the data to arrive before processing, we have to continuously process incoming data. Processing unbounded data flows often requires that data be ingested in a particular order, such as the order in which events occur

Apache Flink specializes in processing streams, and batches are regarded as bounded data streams by Apache Flink, so Flink is called a stream-batch all-in-one data computing engine

What is state computing

What is stateless computation

Each time data calculation only considers the current data, not the calculation results of the previous data

What is stateful computation

Each time data calculation is performed, the calculation is based on the previous data calculation result (state), and each calculation result is saved to the storage medium

In practical applications, most scenarios need to record states, such as statistics of a user’s number of visits in a period of time, the maximum speed of a vehicle in a period of time, etc. Therefore, whether a flow processing framework supports calculation with state is a very important feature

Distributed computing

The support of bounded and unbounded data flow and state calculation is only the feature of Flink framework, while distributed computing is the core of Flink big data framework which is different from ordinary data computing framework.

By deploying instances on multiple machines to form a computing cluster, Flink can divide large tasks that cannot be handled by one computer into multiple small tasks and distribute them to other instances in the cluster for distributed computing. (Details will be covered in a later chapter)

Flink definition summary

Apache Flink is the framework and processing engine
Apache Flink supports distributed computing
Apache Flink supports both unbounded and bounded data streams
Apache Flink supports stateful calculations of data streams

In addition to the main features mentioned above, Flink has many other features, such as

High throughput, low latency, high performance
Independent memory management based on the JVM
supportEvent Time semanticsAnd combined with theWatermarkMechanism that can handle out-of-order data
Supports highly flexible Window operations such as time, count and session
Fault tolerance based on the lightweight Distributed Snapshot (CheckPoint) implementation, ensuring exactly-once semantics
Save Points
Support for storing state in multiple media (memory, file, RocksDB)

We’ll talk about that later

Flink learning materials

Flink website: flink.apache.org/zh/
Flink the latest stable version of the official document: ci.apache.org/projects/fl…
Flink Chinese Community Video courses: ververica.cn/developers/…
Flink source: github.com/apache/flin…

Thanks to alibaba’s promotion, Flink has a large number of Chinese materials. In fact, the best way to learn Flink is to follow the official documents step by step.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Flink profile

Development history of Flink

What is the Flink

The definition of Flink

What is data flow

What is bounded data flow

What is unbounded data flow

What is state computing

What is stateless computation

What is stateful computation

Distributed computing

Flink definition summary

Flink learning materials

Flink profile

Development history of Flink

What is the Flink

The definition of Flink

What is data flow

What is bounded data flow

What is unbounded data flow

What is state computing

What is stateless computation

What is stateful computation

Distributed computing

Flink definition summary

Flink learning materials

Related Posts

Destination unreachable (Host Administratively Prohibited) Prohibited

Interviewer soul blast: How to guarantee 100% successful message delivery? How can message idempotency be guaranteed?

ShardingSphere （1） JDBC