Flink On Yarn architecture

Main startup process

1. Start the process

Sh -n 3 -jm 1024 -nm 1024-st To start the flink on YARN cluster bin/yarn-session.sh -n 3 -jm 1024 -nm 1024-st

A total of five processes will be generated here

  • ** 1 FlinkYarnSessionCli –> Yarn Client **

  • * * 1 YarnApplicationMasterRunner – > AM + JobManager * *

  • 3个YarnTaskManager –> TaskManager

That is, one client with four containers. One container starts AM and three containers start TaskManager.

2. Start the process

  • FlinkYarnSessionCli startup checks whether Yarn has enough resources to start the required Container. If yes, upload some Flink jars and configuration files to HDFS. Here are the dependent JARS and configuration files for starting the AM and TaskManager processes.

  • 2. Then yarn client will first apply for a container to RM to ApplicationMaster (YarnApplicationMasterRunner process), then the RM will inform one of NM launch this container, After being assigned to start the AM, NM downloads the JAR package and configuration file uploaded in the first step from HDFS to the local PC, and then starts the AM. During this process, JobManager is started. Because JobManager and AM are in the same process, it uploads the JobManager address to HDFS as a file. TaskManager also downloads this file to obtain the JobManager address during startup. And then communicate with them; AM is also responsible for Flink web services. Flink uses random ports, which allows users to start multiple YARN Sessions.

  • 3. After AM is started, Container is applied to AM to start TaskManager. During the startup process, HDFS is used to download some JARS containing the main classes of TaskManager (YarnTaskManager in Yarn mode) and configuration files that depend on the startup process. Such as the JobManager address file, then use Java CP to start YarnTaskManager, once these are ready, you can accept the task. This is similar to the Yarn Cluster mode of Spark on YARN. It is also divided into two parts: one is to prepare workers and tools (Spark is the process of starting sc, flink is the process of initializing ENV), and the other is to assign workers specific work (both are specific operations. Action trigger). Start command:

——————-

Like Spark, Flink has three deployment modes: Local, Standalone Cluster, and Yarn Cluster.

The deployment mode is Yarn Cluster. This document describes how to execute flink tasks and allocate resources in Yarn mode.

Yarn Cluster mode

As shown in the figure, the relationship between Flink and Yarn is the same as that between MapReduce and Yarn. Flink implements its App Master through the Yarn interface. When Flink is deployed in Yarn, Yarn uses its own Container to start Flink’s JobManager (App Master) and TaskManager.

end