Flink profile


Apache Flink is a distributed big data processing engine capable of stateful computation for both finite and infinite data streams. Can be deployed in a variety of cluster environments for rapid calculations of data sizes.

  • Processing flow

  • Core components:

Flink features


  1. Support batch processing and data stream processing
  2. Supports both high throughput and low latency
  3. Flexible Windows (time, slide, roll, session, custom trigger) supported with different time semantics (time, ingest, processing)
  4. Fault tolerant guarantees that are handled only once
  5. Automatic backpressure mechanism
  6. Support CEP, such as anti-fraud and risk control rule engine
  7. Compatibility with MapReduce/Storm interfaces, allowing reuse of its code
  8. Integrate YARN, HDFS, HBase, and other Hadoop ecological components

Flink compares with other frameworks


Record ACK Micro-batching Transactional updates Distributed snapshots
Typical representative Apache Storm Storm Trident ,Spark Streaming Google Cloud Dataflow Apache Flink
Semantic guarantee At least once Exactly once Exactly once Exactly once
delay low high Lower (things delay) low
throughput low high Higher (depending on the amount of food stored) high
Calculation model flow The number of flow flow
Fault tolerant overhead high low Low (depends on the throughput of things stored) low
Flow control poor poor good good
Business flexibility (separation of business and fault tolerance) Part of the Tightly coupled The separation of The separation of

Deployment way


  • Standalone
  1. Flink runs on a cluster consisting of a master node and one or more worker nodes.

  • Flink on YARN
  • Separation mode (A long-running Flink cluster on YARN)
    1. A permanent process is required in Yarn. The resource size is determined during Yarn startup. After Yarn is started, an Application is resident in Yarn, which is the permanent process of Flink JobManager in Yarn.
    2. JobManager can be fault-tolerant. If JobManager fails, Yarn automatically restarts a JobManager without affecting job submission.
    3. Different jobs are affected by the size of resources. Resources are allocated at startup. If a task uses up the given resources, other tasks can be executed only after the resources are released.
  • Client mode (A single Flink job on YARN)
    1. Flink serves only as a client and does not need to reside in the Yarn process.
    2. The JobManager starts only after the Job is submitted.
    3. Resources of each Job are isolated. Resources of different jobs are independent, and resource allocation of jobs is affected only by the Yarn resource policy.

Fault-tolerant processing


  • Checkpoint — Checkpoint makes the state of Fink very fault tolerant. Flink can restore the state and calculation position of the job through Checkpoint, so Flink job has a high fault tolerant execution semantics. A Checkpoint is to create a snapshot of the status of all tasks at a certain time and store it to State Backend. — Lightweight fault tolerance — Checkpoint to ensure Exactly once semantics — automatic recovery of internal failures without human intervention

  • Savepoint

    • Savepoints is a self-testing Savepoints feature stored in an external file system that provides the ability to stop-restore or upgrade Flink programs. It uses Flink’s checkpoint mechanism to create a full (non-incremental) state snapshot of a stream job and writes checkpoint data and metadata to an external file system. And it doesn’t expire, it doesn’t get overwritten unless you manually delete it.
    • The historical version of the state in the flow process
    • Has the function of replay
    • External recovery (application restart and upgrade)
    • Two trigger modes
      • Cancel with Savepoint
      • Manual active trigger
  • State Backend

  • After the Checkpoint function is enabled, the status is persisted at each Checkpoint to prevent data loss and restore consistency. How the internal state is presented, and how and where the checkpoint-based persistent state is stored, depends on the state backend chosen.
  • Out of the box, Flink integrates the following state Backend systems:
    • MemoryStateBackend(default) : Stores data as objects in the Java Heap. Based on HeapKeyedStateBackend, MemoryStateBackend may cause OOM.
    • FsStateBackend: Saves data to File systems, such as HDFS and Local File. HeapKeyedStateBackend may cause OOM.
  • RocksDBStateBackend : Using RocksDB to store state, RocksDB overcomes the memory limitation of HeapKeyedStateBackend and persists to a remote file system. It maintains state in the local file system. KeyedStateBackend writes state directly to the local RocksDB. When configuring a remote FileSystem(such as HDFS), the local data is directly copied to the FileSystem during Checkpoint operation. Restore from FileSystem to local during a fail over.