“This is the 16th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

Yarn Resource scheduler

Think about:

1) How to manage cluster resources?

2) How to allocate resources to tasks properly?

Yarn is a resource scheduling platform that provides server computing resources for computing programs. It is similar to a distributed operating system platform. MapReduce is similar to applications running on an operating system.

1. Yarn infrastructure

YARN consists of ResourceManager, NodeManager, ApplicationMaster, and Container.

2. Yarn working mechanism

(1) The MR program is submitted to the node where the client is located.

(2) The YarnRunner applies for an Application from ResourceManager.

(3) RM returns the resource path of the application to YarnRunner.

(4) The program submits the resources required for running to HDFS.

(5) After submitting the program resources, apply for running mrAppMaster.

(6) RM initializes the user’s request into a Task.

(7) One of the NodeManagers receives the Task.

(8) The NodeManager creates the Container and generates MRAppmaster.

(9) Container Copies resources from the HDFS to the local PC.

(10) MRAppmaster applies to RM for running MapTask resources.

(11) RM assigns the MapTask running task to the other two NodeManagers. The other two NodeManagers pick up the task and create the container respectively.

(12) MR sends the program startup script to the two NodeManagers that receive the task. The two NodeManagers respectively start MapTask, which sorts the data partitions.

(13) MrAppMaster waits for the completion of all MapTasks and applies for a container from RM to run the ReduceTask.

(14) ReduceTask to obtain data of corresponding partitions from MapTask.

(15) After the program is run, MR will apply to RM to cancel himself.

3. Assignment submission process

3.1 Relationship among HDFS, YARN, and MapReduce

3.2 YARN in the job submission process

3.3 HDFS & MapReduce in job submission process

3.4 Detailed explanation of the whole process of homework submission

(1) Homework submission

Step 1: The Client invokes the job.waitForCompletion method to submit a MapReduce job to the entire cluster.

Step 2: The Client applies for a job ID from RM.

Step 3: RM returns the submission path and job ID of the job resource to the Client.

Step 4: The Client submits the JAR package, slice information, and configuration file to the specified resource submission path.

Step 5: After submitting resources, the Client applies to RM for running MrAppMaster.

(2) Job initialization

Step 6: After receiving the request from the Client, RM adds the job to the capacity scheduler.

Step 7: An idle NM receives the Job.

Step 8: The NM creates the Container and generates the MRAppmaster.

Step 9: Download the resources submitted by the Client to the local PC.

(3) Task allocation

Step 10: MrAppMaster applies to RM to run multiple MapTask resources.

Step 11: RM assigns the running MapTask task to two other NodeManagers, who pick up the task and create the container, respectively.

(4) Task operation

Step 12: MR sends the program startup script to the two NodeManagers that receive the task. Each NodeManager starts MapTask, which sorts the data partitions.

Step 13: MrAppMaster waits for all mapTasks to complete, then applies for a container from RM and runs the ReduceTask.

Step 14: ReduceTask to get the data of the corresponding partition from MapTask.

Step 15: After the program is run, MR will apply to RM to cancel himself.

(5) Progress and status updates

YARN in the progress and status (including counter) is returned to the application manager, the client per second (through graphs. Client. Progressmonitor. Pollinterval) to the application manager request progress update, display to the user.

(6) Homework completed

In addition to requesting the job progress from the application manager, the client checks that the job is complete by calling waitForCompletion() every five seconds. Time interval by mapreduce.client.com pletion. Pollinterval to set. After the job is complete, the application manager and the Container clean up the work state. Job information is stored by the job history server for later verification by users.

2. Yarn scheduler and scheduling algorithm

Currently, There are three main Hadoop job schedulers: FIFO, Capacity Scheduler and Fair Scheduler. The default resource Scheduler for Apache Hadoop3.1.3 is the Capacity Scheduler.

The CDH framework default Scheduler is Fair Scheduler.

For details, see yarn-default. XML

<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
Copy the code

1. FIFO Scheduler

FIFO scheduler (First In First Out) : single queue, First come, First served, based on the order In which jobs are submitted.

Advantages: Easy to understand;

Disadvantages: Does not support multiple queues and is rarely used in production environments.

2. Capacity Schedule

Capacity Scheduler is a multi-user Scheduler developed by Yahoo.

Capacity scheduler resource allocation algorithm

3. Fair Schedule

Fair Schedulere is a multi-user scheduler developed by Facebook.

  1. Fair scheduler features

  2. Fair scheduler gap

  3. Fair scheduler queue resource allocation

  4. Fair scheduler resource allocation algorithm

  5. Fair scheduler queue resource allocation

  6. Fair scheduler queue resource allocation

Three, friendship links

Big data Hadoop-MapReduce learning journey 6

Big data Hadoop-MapReduce learning journey chapter 5

Big data Hadoop-MapReduce learning journey chapter 4

Big data Hadoop-MapReduce learning journey chapter 3

Big data Hadoop-MapReduce learning journey chapter 2

Big data Hadoop-MapReduce learning journey first