By comparing the architecture of Hadoop1.0 and 2.0, the role of YARN as a resource scheduler and manager is introduced.

1. Background of YARN generation

YARN is evolved from MRv1 and overcomes the limitations of MRv1. Before introducing YARN, we need to understand some limitations of MRv1, which can be summarized as follows:

  • Poor scalability: In MRv1, JobTracker has two functions of resource management and job control at the same time, which has become the biggest bottleneck of the system and seriously restricts the scalability of Hadoop cluster.
  • Poor reliability: MRv1 uses the Master/Slaver structure. The master node has a single point of failure. If the master node fails, the entire cluster becomes unavailable.
  • Low resource utilization: MRv1 uses a slot-based resource allocation model. A slot is a coarse-grained resource allocation unit. Usually, a task cannot use up the resources corresponding to a slot, and other tasks cannot use these resources. In addition, Hadoop divides slots into Map slots and Reduce slots and does not allow them to be shared. As a result, one Slot is scarce and the other is idle (for example, when a job is submitted, only the Map Task is running, and the Reduce Slot is idle).
  • Multiple computing frameworks are not supported.

2. What is YARN?

YARN is an elastic computing platform. It does not support only one MapReduce computing framework, but manages multiple frameworks in a unified manner. The basic idea of YARN is to separate the resource management and job scheduling/monitoring functions into separate daemons. There is a global ResourceManager (RM) and an ApplicationMaster (AM) for each application. An application can be a single job or a DAG for a job.

3. What is YARN’s role?

3.1. Hadoop V1.0

In the framework of Hadoop V1.0, data processing and resource scheduling are mainly completed by MapReduce, and the specific process is shown as follows:

From the above figure, we can understand the data processing mode of Hadoop V1.0. In small-scale data processing, this method is not too much of a problem. However, in real-world scenarios where large amounts of data need to be processed, this approach encounters the following problems:

  • Because a large number of Data processing jobs are submitted to Job Tracker, and there may be thousands of Data nodes for Job Tracker to coordinate, Job Tracker can easily become a bottleneck in the performance and availability of the entire system.
  • Unable to allocate resources effectively, resulting in uneven allocation of resources. For example, assume that there are three Data nodes (DN) with 4GB memory for each DN. A user submits six jobs. Each Job requires 1GB memory for processing, and data is stored on DN2. Since DN2 only has 4GB of memory, job1-4 runs on DN2 while Job5 and 6 are queued up. However, at this time, BOTH DN1 and 3 are idle and cannot be effectively utilized.

3.2. YARN mode

Based on the above, Hadoop introduced YARN (Yet Another Resource Negotiator) in version 2.0. The core idea of YARN is to separate resource management from Job scheduling and monitoring. The following figure shows the YARN architecture.

The core components of YARN are divided into two parts:

Global components

  • Resource Manager (RM): manages, schedules, and allocates resources globally. The Resource Manager consists of a Scheduler (a Scheduler, essentially a policy) and an Applicatio Manager (an application Manager, ASM, which manages applications submitted by Client users). Scheduler allocates resources to applications based on node capacity and queue status. Application Manager accepts requests submitted by users, starts the Application Master in the node, monitors the status of the Application Master, and performs necessary restarts.
  • Node Manager (NM): On each Node, a Node Manager acts as an agent to monitor the Resource usage of the Node (CPU, memory, disk, network) and report the Node status to the Resource Manager.

Per – applicaiton components

  • Application Master (AM): Schedules data processing jobs. The Application Master communicates with the Resource Manager to obtain resources for calculation. After obtaining the resources, communicate with the Node Manager on the Node, summarize the tasks in the assigned Container, and monitor the task execution. Each time the Client submits an Application, an ApplicationMaster is created. ApplicationMaster applies for container resources with ResourceManager. After obtaining the resources, the programs to be run are sent to the container to start and then perform distributed computing.
  • Container: An abstract form of resources that encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, network, etc. When the Application Master applies for resources from Resource Manager, The Resource returned by the Resource Manager for the Application Master is the Container.

When YARN accepts jobs submitted by users, the working process is as follows:

YARN solves the preceding problems in the following ways.

  • Solve the bottleneck problem of Job Tracker through Application Master. Each time a new job is submitted, the Resource Manager starts a new Application Master on the appropriate node to avoid the embarrassment of job Tracker becoming a performance bottleneck in Hadoop V1.0.
  • More efficient scheduling of resources. The Resource Manager can allocate free resources to the Application Master to help the Application Master complete the task.
  • Supports data processing methods other than MapReduce, such as Spark.

4. Compare YARN with other resource managers

Even with Hadoop V2.0 application YARN design, there is still a problem: After a large number of jobs are submitted and all computing resources are used up, new jobs can be processed for a long time. Even if new jobs are very important, they can only be processed. In YARN, the Scheduler plugin (for example: FIFO Scheduler, Fair Scheduler, and Capacity Scheduler are used to configure different resource scheduling rules to minimize this problem and prioritize important jobs.

5. Reference materials

www.jianshu.com/p/952d59b7c…

Hadoop.apache.org/docs/curren…