Flink scheduling

Flink- Perform logic

The article directories

    • I. Role role
    • Ii. Task submission process
      • 1.1 Standlone
      • 1.2 Yarn
    • TaskManager and slots
      • 3.1 Functions and relationships
      • 3.2 Sharing Mechanism

I. Role role

Client

Client is the Client that submits jobs. It can run on any machine (connected to the JobManager environment). After the Job is submitted, the Client can end the process (the Streaming task) or wait for the result to return.

JobManager

JobManager has a number of responsibilities related to coordinating the distributed execution of Flink applications: it decides when to schedule the next task (or group of tasks), reacts to completed tasks or execution failures, coordinates checkpoint, and coordinates recovery from failures, and so on. This process consists of three different components:

  • ResourceManager provides, reclaims, and allocates resources in the Flink cluster. ResourceManager manages Task slots, which is the unit of resource scheduling in the Flink cluster. Flink implements the corresponding ResourceManager for different environments and resource providers such as YARN, Mesos, Kubernetes, and standalone deployments. In the standalone setting, ResourceManager can only allocate slots for available TaskManagers and cannot start a new TaskManager itself.
  • The Dispatcher provides a REST interface to submit Flink application execution and start a new JobMaster for each submitted job. It also runs the Flink WebUI to provide job execution information.
  • The JobMaster is responsible for managing the execution of the individual JobGraph. Multiple jobs can run simultaneously in a Flink cluster, each with its own JobMaster.

Always have at least one JobManager. There may be multiple JobManagers in a high availability (HA) setting, one of which is always the Leader and the others are standby.

TaskManager

TaskManager starts with a number of slots. Each Slot can start a Task, which is a thread. The system receives tasks to be deployed from JobManager. After the deployment starts, the system establishes a connection with the upstream to receive data and process it.

Slot

Flink cluster is composed of JobManager (JM) and TaskManager (TM), and each JM/TM runs in an independent JVM process. A JM is the Master and the management node of a cluster, while a TM is the Worker and the working node of a cluster. Each TM has at least one Slot. Slot is the minimum resource allocation unit for Flink to execute a Job, and a specific Task runs in Slot.

Ii. Task submission process

1.1 Standlone

Pictures ([address] (www.jianshu.com/p/f1b16b74a…

  1. APP applications are submitted to Dispatcher through a RestFul interface (the interface is cross-platform and can pass through the firewall directly, no interception is considered).
  2. The Dispatcher starts the JobManager process and delivers the application to JobManager.
  3. After JobManager obtains the application, the JobManager applies for resources (slots) from ResourceManager. ResourceManager starts the Corresponding TaskManager process. The idle slots of TaskManager are registered with ResourceManager.
  4. ResourceManager sends directives to TaskManager based on the number of resources requested by JobManager. (These slots are provided to JobManager by you.)
  5. TaskManager can now communicate directly with JobManager (there are heartbeat packets between them). TaskManager provides Slots to JobManager, and JobManager assigns tasks to TaskManager to perform within slots.
  6. Finally, there is an exchange of data between different TaskManagers during the execution of a task.

1.2 Yarn

  1. Before submitting the App, upload the Flink Jar package and configuration to HDFS so that JobManager and TaskManager can share HDFS data.
  2. The client submits a Job to ResourceManager. After receiving the request, ResouceManager allocates container resources and notifies NodeManager to start ApplicationMaster.
  3. ApplicationMaster loads the HDFS configuration and starts the corresponding JobManager, which then analyzes the current job graph and converts it into an execution graph (containing all tasks that can be executed concurrently) to know the specific resources required.
  4. Then, JobManager applies for resources from ResourceManager. After receiving the request, ResouceManager allocates container resources. It then tells ApplictaionMaster to start more TaskManagers (allocate container resources before starting TaskManager). When Container starts TaskManager, it also loads data from HDFS.
  5. Finally, when TaskManager starts, it sends heartbeat packets to JobManager. The JobManager assigns tasks to the TaskManager.

TaskManager and slots

Each worker (TaskManager) is a JVM process that can execute one or more subtasks in a separate thread. To control how many tasks are accepted in a TaskManager, there is something called Task slots (at least one).

3.1 Functions and relationships

  1. Each TaskManager in Flink is a JVM process that may execute one or more subtasks on separate threads.
  2. In order to control how many tasks a TaskManager can receive, the TaskManager controls this through task Slot (each TaskManager has at least one slot).
  3. Each Task slot represents a fixed-size subset of resources owned by the TaskManager. If a TaskManager has three slots, it triples the memory it manages into each slot(note: CPU isolation is not involved here; slot is only used to isolate task managed memory).
  4. You can customize the isolation mode between subtasks by adjusting the number of task slots. When a TaskManager is a slot, each Task group runs in a separate JVM. When a TaskManager has multiple slots, multiple subTasks can share a SINGLE JVM, and tasks within the same JVM process will share TCP connections and heartbeat messages, and possibly data sets and data structures, reducing the load on each task.

3.2 Sharing Mechanism

  1. By default, Flink allows subtasks to share slots, even if they are subtasks of different tasks (provided they come from the same job). The result is that a single slot can hold the entire pipe of a job.
  2. Task Slot is a static concept, refers to the concurrent execution ability, TaskManager has can pass parameters TaskManager. NumberOfTaskSlots configured; Parallelism is a dynamic concept, that is, the actual concurrency used by TaskManager to run programs. It can be configured using the parameter Parallelism. Default.
  3. For example, if there are 3 TaskManagers, and each TaskManager has 3 TaskSlot assigned, each TaskManager can receive 3 tasks, so we can receive 9 tasksoTs in total. However, if we set parallelism. Default =1, then only one of the nine taskslots will be running when the program is running, and all eight of them will be idle.

Slot refers to the concurrent execution capability of the TaskManager

Assign 3 taskslots to each taskManager for a total of 9 taskslots.

Parallelism refers to the concurrency capabilities that TaskManager actually uses

parallelism.default:1

The default parallelism of the running program is 1, using only 1 of 9 Taskslots and 8 of them free. Setting the right degree of parallelism can improve efficiency.

Parallelism is configurable and specifiable

The degree of parallelism set by each operator is 2.

The degree of parallelism set by each operator is 9.

Parallelism is configurable and designable, with Sink introduced

Except that the parallelism set by sink is 1, the parallelism set by other operators is 9.

Note: If you set parallelism to more than the maximum number of slots that Task Manager can provide, the program will throw an exception message.

reference

Ci.apache.org/projects/fl…

www.slideshare.net/robertmetzg…

The public,