Yarn

1 What is Yarn

Apache YARN (Yet Another Resource Negotiator) is the cluster Resource management system in Hadoop. Yarn was introduced into Hadoop 2 to implement MapReduce, but it is versatile enough to support other distributed computing modes.

Yarn provides apis for requesting and using cluster resources, but these apis are rarely used directly for user code. Instead, the user code uses higher-level apis provided by the Distributed computing framework, which are built on TOP of Yarn and hide resource management details from the user. As described in the following figure, some distributed computing frameworks (MapReduce, Spark, etc.) run on the cluster computing layer (Yarn) and cluster storage layer (HDFS and HBase) as Yarn applications.

Another layer of applications is built on top of the above Application. Pig and Hive, for example, are processing frameworks that run on MapReduce, Spark, or Tez (or all three) and do not work directly with Yarn.

2 Yarn operation mechanism

Yarn provides its core services through two types of long-running daemons: the Resource Manager, which manages resource usage on the cluster, and the Node Manager, which runs on all nodes in the cluster and can start and monitor containers. A container is used to execute a process for a particular application. Each container has resource limits (memory, CPU, and so on). A container can be a Unix process or a Linux Cgroup, depending on the Yarn configuration. The following figure describes how Yarn runs an application.

To run an application on Yarn, the client first contacts the resource manager and asks it to run an Application Master process (Step 1). The resource manager then finds a node manager that can start the Application Master in the container (steps 2A and 2b). Exactly what the Application Master can do once it’s up and running depends on the application itself. It could be simply running a calculation in the container and returning the result to the client; Or request more containers from resource manager (Step 3) to run a distributed computing (steps 4A and 4b). The latter is what the MapReduce Yarn application does.

Note that Yarn by itself does not provide any means for the various parts of the application (client, master, and process) to communicate with each other. Most important Yarn applications use some form of remote communication mechanism (such as the RPC layer in Hadoop) to pass status updates and return results to clients, but these communication mechanisms are application-specific.

2.1 Resource Request

Yarn has a flexible resource request model. When requesting multiple containers, you can specify the number of computer resources (memory and CPU) required for each container, and you can specify a local limit requirement for the container.

Localization is important to ensure efficient use of cluster bandwidth by distributed data processing algorithms, so Yarn allows an application to perform local restrictions for the applied containers. Local restrictions can be used to apply for containers that are located on a specified node or rack, or anywhere in the cluster (outside the rack).

Sometimes local limit cannot be satisfied, and this case or not allocation of resources, or may choose to relax restrictions, for example, because a node has been running for other container can’t start a new container, it is if there is application request to the node, the Yarn will attempt to other nodes in the same frame start a container, if not yet, Any node in the cluster will be tried.

Typically, when a container is started to process HDFS data blocks, applications apply for containers from nodes that store three copies of the data block, or a node in the rack that stores the copies. If the application fails, apply for any node in the cluster.

The Yarn application can apply for resources at any time. For example, all requests can be made initially, or requests can be made in a more dynamic manner when more resources are needed to meet changing application needs.

Spark uses the first method to start a fixed number of actuators on a cluster. MapReduce, on the other hand, applies for map task containers at the beginning and starts Reduce task containers at the later stage. Also, if any task fails, another container is requested to rerun the failed task.

2.2 Application life

Life cycle of Yarn Applications Applications can be classified based on the mapping between applications and jobs run by users. In the simplest model, one user job corresponds to one application, which is the same method adopted by MapReduce.

The second model is one application for each workflow or user conversation of a job. This approach is more efficient than the first case because the container can be reused between jobs, and it is possible to cache intermediate data between jobs. Spark uses this model.

The third model is one in which multiple users share a long-running application. The application usually operates as a coordinator. Because you avoid the overhead of starting a new Application Master, an alway open Application Master means that users will get very low latency query responses.

3 Yarn resource scheduling

Ideally, resource requests made by Yarn applications should be fulfilled immediately. In reality, however, resources are limited, and on a busy cluster, an application often has to wait several times to get the required resources. The job of the Yarn scheduler is to allocate resources to applications based on a defined policy.

3.1 Scheduling Options

Three types of schedulers are available in Yarn: FIFO Scheduler, Capacity Scheduler, and Fair Scheduler. The FIFO scheduler places applications in a queue and runs them in the order they are submitted (first-in, first-out). Resources are allocated to the request of the first application in the queue. After the request of the first application is satisfied, the next application in the queue is served in turn.

The FIFO scheduler has the advantage of being straightforward and requiring no configuration, but is not suitable for shared clusters. Large applications consume all the resources in the cluster, so each application must wait until it is its turn to run. In a shared cluster, a capacity or fairness scheduler is more suitable. Both schedulers allow long-running jobs to complete in a timely manner, while also allowing users running smaller AD hoc queries to return results in a reasonable amount of time.

The following figure illustrates the differences between schedulers. When using the FIFO scheduler (I), small jobs are blocked until the big job completes. With the capacity scheduler (II), a separate dedicated queue guarantees that small jobs can be started as soon as they are committed, and since queue capacity is reserved for jobs in that queue, this strategy comes at the expense of overall cluster utilization. This means that large jobs take longer to execute than with a FIFO scheduler. With the fair scheduler (III), there is no need to set aside a certain amount of resources because the scheduler equalizes resources between all running jobs in the winter, and when the first (large) job starts, it is the only job running and therefore gets all the resources in the cluster. When the second (small) job is started, it is allocated half of the cluster’s resources so that each job shares the resources equally.

Note that starting from the second job must wait for the container used by the first job to run out and release resources. When the small job ends and no more resources are requested, the large job will go back and use all of the cluster resources again. The end result is: high cluster utilization, but also to ensure that small jobs can be completed in time.