Author: Unreal good

Source: Hang Seng LIGHT Cloud Community

Basic overview

Apache YARN (Yet Another Resource Negotiator) is the Resource management and job scheduling system in Hadoop that was introduced in Hadoop 2.x.

Users can deploy various service frameworks on YARN for unified management and resource allocation.

Yarn is introduced in Hadoop2.x. In Hadoop1.x, MapReduce allocates resources. If MapReduce fails in computing, resource scheduling stops. Have a Yarn.

Core architecture

Yarn architecture consists of ResourceManager, NodeManager, ApplicationMaster, and Container

ResourceManager

  • ResourceManagerUsually deployed independently on a single machine running as an application, there is only one in the cluster, responsible for resource management and allocation of the entire system.
  • ResourceManagerIt is mainly composed of two components: Scheduler and Applications Manager (ASM). It can make decisions based on application priorities, queue capacity, and data location, make allocation policies, and schedule cluster resources in a secure, shared, and multi-tenant manner.

NodeManager

  • NodeManagerIs the manager of each node in the YARN cluster. Responsible for managing the life cycle of all containers within the node, monitoring resources and tracking node health.
  • NodeManagerMainly used for processing fromResourceManagerApplicationMasterThe command.

When a node is started, it registers with ResourceManager and pushes available resource information. During the running, NodeManager and ResourceManager work together to constantly update the information and ensure the optimal status of the cluster.

ApplicationMaster

  • When a user submits an application, YARN starts a lightweight processApplicationMaster.
  • ApplicationMasterResponsible for coordinating fromResourceManagerResources, and throughNodeManagerMonitors resource usage within the container and is responsible for task monitoring and fault tolerance.

ApplicationMaster can split data, dynamically match resource requirements based on application status, monitor and track task status and progress, and report application progress information.

Container

  • ContainerA resource abstraction in YARN encapsulates multi-dimensional resources on a node, such as memory, CPU, disks, and networks.
  • whenApplicationMasterResourceManagerWhen applying for resources,ResourceManagerApplicationMasterThe returned resource is usedContainerSaid.
  • YARN allocates one task to each taskContainerCan be used only by this taskContainerResources described in.
  • ApplicationMasterCan be inContainerTo run any type of task.

The working process

The whole workflow of YARN application submission:

  • First, the client submits the task to YARN.ResourceManagerSubmit the application and request oneApplicationMasterInstance;
  • ResourceManagerWill choose a runnable oneNodeManagerAnd, inContainerIs up and runningApplicationMasterInstance;
  • To start theApplicationMasterResourceManagerRegister yourself and maintain heartbeat communication with RM after successful startup.
  • ApplicationMasterResourceManagerSend the request to get the requiredContainerResources;
  • ApplicationMasterBy getting itContainerResources perform distributed computing.
  • After the application runs,ApplicationMasterResourceManagerUnregister yourself and allow it to belong to youcontainerBe retrieved.

conclusion

Yarn schedules and allocates service resources in the Hadoop system to maximize the utilization of machine resources.