Hadoop master and slave run in different Docker containers. NameNode and ResourceManager run in hadoop-master container, DataNode and NodeManager run in Hadoop-slave container. NameNode and DataNode are components of Hadoop distributed file system HDFS to store input and output data. ResourceManager and NodeManager are components of Hadoop cluster resource management system YARN to schedule CPU and memory resources.

Let’s plan the cluster first:

Let’s explain some of the software above

NameNode refers to the NameNode in HDFS. It is divided into an active and standby mode to ensure high availability.

The active node is in active state and the Standby node is in Standby state. The active/Standby switchover is completed using DFSZKFailoverController and ZooKeeper.

In addition, to ensure high availability of edit logs on master, three JournalNodes are created.

Next, yarn is a resource management system that centrally manages and schedules clusters

Above is the introduction part, next we will implement the high availability cluster construction

The first step is to go to the Hadoop directory and run docker-compose up -d

Step 2 Run./start-all.sh

After completing the above steps, we are ready to view the data from the console output.

Here are some cluster validation operations:

1. Verify that HDFS works properly and HA is highly available

First, upload a file to HDFS

/usr/local/hadoop/bin/hadoop fs -put /usr/local/hadoop/README.txt /

Manually disable the Active Namenode on the active node

/usr/local/hadoop/sbin/hadoop-daemon.sh stop namenode

Check whether the standby Namenode status changes to Active through port HTTP 50070

Manually start the Namenode closed in the previous step

/usr/local/hadoop/sbin/hadoop-daemon.sh start namenode

2. Verify that YARN works properly and that ResourceManager HA is highly available

Run the WordCount program in the demo provided by testing Hadoop:

/usr/local/hadoop/bin/hadoop fs -mkdir /wordcount

/usr/local/hadoop/bin/hadoop fs -mkdir /wordcount/input

/usr/local/hadoop/bin/hadoop fs -mv /README.txt /wordcount/input

/ usr/local/hadoop/bin/hadoop jar share/hadoop/graphs/hadoop – graphs – examples – 2.7.4. Jar wordcount/wordcount/input /wordcount/output

Verify the ResourceManager HA

Manually stop ResourceManager on Node02

/usr/local/hadoop/sbin/yarn-daemon.sh stop resourcemanager

Access ResourceManager on Node01 through HTTP port 8088 to check the status

Manually start ResourceManager on Node02

/usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager

Code making address: https://github.com/zhuanxuhit/distributed-system/issues/4 welcome the attention

Your encouragement is my motivation to continue writing, and I look forward to our common progress.