Deploy using installation packages

Download address

https://archive.apache.org/dist/flink/flink-1.7.2/flink-1.7.2-bin-hadoop27-scala_2.11.tgz

Copy the code

Download and unzip


Start the flink



Access the Flink Web UI

http://localhost:8081

Copy the code

Run the Flink Demo program

  • The port 7777 service is enabled
nc -lk 7777

Copy the code
  • Run the Flink Wordcount streamJAR package passed into port 7777
bin/flink run examples/streaming/SocketWindowWordCount.jar --port 7777

Copy the code

  • View the startup status on the Web UI

You can see the task in action


  • Enter content in 7777’s service

  • See the log
tail -f flink*.out

Copy the code

  • Shut down the 7777 service

You can see that the task is finished


Summarize the above process


For details, see Task execution

A brief introduction to the principle and use of Flink, the open source framework for stream processing

After using Flink briefly, let’s analyze the operation principle of Flink

The principle is introduced

Flink runs the components


Flink is implemented in Java and Scala so all components run on the Java virtual machine

  • JobManager

    • The main process that controls the execution of an application. Each application is executed by a different JobManager

    • The JobManager first receives the application to execute, which includes the JobGraph, the Logical Dataflow Graph, and the JAR packages that package all the classes, libraries, and other resources

    • JobManager will turn JobGraph into a physical data flow graph called an ExecutionGraph that contains all the tasks that can be executed concurrently

    • The JobManager requests resources from the ResourceManager that are necessary to execute the task, namely slots on the TaskManager. Once it has obtained enough resources, the JobManager distributes the execution diagrams to the TaskManager that actually runs them. During execution, JobManager takes charge of any operation that requires central coordination, such as CheckPoints (which I provide)

  • TaskManager

    • The worker process in Flink runs through multiple TaskManagers in Flink, each of which contains a certain number of slot slots. The number of slots limits the number of tasks a TaskManager can perform

    • During execution, a TaskManager can exchange data with other TaskManagers running in the same application

  • Resource Manager (ResouceManager)

    • A TaskManager slot is a processing resource unit defined in Flink

    • Flink provides different resource managers for different environments and resource management tools such as Yarn, Mesos, K8s, standalone deployment

    • When JobManager applies for slot resources, ResourceManager allocates TaskManager with free slots to JobManager if ResourceManager does not have enough slots to meet JobManager requests You can also initiate a session to the resource provider platform to provide a container to start the TaskManager process. ResourceManager terminates idle TaskManager to release computing resources

  • The dispenser (the Dispatcher)

    • It can be run across jobs and provides a Rest interface for application submission

    • When a task is submitted, the distributor starts and hands off the application to a JobManager

    • The Dispatcher also launches a Web UI to facilitate displaying and monitoring job execution information

    • Dispatcher may not be necessary in the architecture depending on how the application submission is run

Task Submission Process


If the cluster environment deployed is different (e.g. YARN, Mesos, Kubernetes, standalone, etc.), some of these steps can be omitted, or some of the components will run in the same JVM process

  • Deploy the Flink cluster to YARN

  • Flink task submitted

  • The Client uploads Flink Jar packages and configurations to the HDFS

  • Then submit tasks to Yarn ResourceManager

  • ResourceManager allocates Container resources and notifies NodeManager to start ApplicationMaster

  • ApplicationMaster starts and loads Flink’s Jar package and configures the build environment

  • Then start JobManager

  • ApplicationMaster then applies for resources to start TaskManager from ResourceManager

  • ResourceManager allocates the Container resources

  • The ApplicationMaster initiates TaskManager through the NodeManager of the node where the resource source is located

  • NodeManager loads Flink’s Jar package and configures the build environment and starts TaskManager

  • After TaskManager starts, it sends heartbeat packets to JobManager and waits for JobManager to assign tasks to it