This article is from the author: Atsushi Minutsu’s wonderful sharing on GitChat, which is divided into three parts due to its length.

“Read the original article” to see what questions were exchanged with the author

Previously shared:

The Most Detailed Hadoop Environment Setup ever (Part 1)

The Most Detailed Hadoop Environment Setup ever (Part 2)

Part five: Hadoop HA Installation

When the current working machine is down, it automatically handles the exception and seamlessly transfers work to another standby machine to ensure High Availability of services.

HA is the most common installation and deployment mode in the production environment. Hadoop HA is a new feature added to Hadoop 2.x, including the NameNode HA and ResourceManager HA.

Because datanodes and NodeManagers are designed to be highly available, no special high availability processing is required for them.

Step 9 Time server setup

Hadoop has high requirements on the time synchronization of each machine in the cluster. It requires that the system time of each machine should not differ too much. Otherwise, many problems may occur.

Can be configured in a cluster of each machine and the Internet time server for time synchronization, but in the actual production environment, most of the cluster server is unable to connect the network, this time could be trying to build an own time server (NTP) server, a cluster of each machine and the time server for time synchronization.

33. Configure the NTP server

We select a third machine (bigdata-senior03.chybinmy.com) as the NTF server, and all other machines are synced with it.

  1. Check whether the NTP service is installed

The NTP program is installed. Ntpdate-4.2.6 p5-1.el6.centos.x86_64 is used to synchronize data with a server, and ntp-4.2.6p5-1.el6.centos.x86_64 is used to provide the time synchronization service.

2. Modify the ntp.conf configuration file

Enable restrice to change the network segment

Restrict 192.168.100.0 mask 255.255.255.0 nomodify notrap Drop the comment from this line and set the network segment to the cluster network segment. In this case, the network segment is 100.

Comment out the server domain name configuration

Is the domain name of the time server, there is no need to connect to the Internet, so comment them out.

Modify the

Server 127.127.1.0

Fudge 127.127.1.0 stratum 10

3. Modify the NTPD configuration file

Add a line of configuration: SYNC_CLOCK=yes

4. Start the NTP service

In this way, the NTP service automatically starts every time the machine starts.

Configure synchronization for other machines

Switch to the root user. Perform periodic synchronization using contab:

35. Test whether synchronization is effective

  1. Check the time of the current three machines

2. Change the time on Bigdata-senior01

Change the time to a previous time:

Wait 10 minutes and check whether automatic synchronization can be performed. Change the time on Bigdata-Senior01 to be the same as that on Bigdata-senior03.

3. Check whether time is automatically synchronized

You can see that the time on Bigdata-Senior01 has been automatically synchronized.

Step 10 Deploy Zookeeper on distributed machines

ZooKeeper

The role of Zookeeper in Hadoop clusters.

Zookeeper is a distributed management collaboration framework. The Zookeeper cluster is used to ensure high availability of the Hadoop cluster. (High availability means that even if some servers in the cluster are down, services can be provided properly.)

Zookeeper ensures high availability.

The Zookeeper cluster ensures high availability of NamaNode services by: There are two NameNode services in the Hadoop cluster, and both Naamenodes periodically send heartbeat to Zookeeper to tell Zookeeper that I am still alive and can provide services. Only one of them is in the Action state at a certain time. When Zookeeper fails to detect the heartbeat sent by the Action NameNode, it switches to the Standby NameNode and sets it to the Action state. Therefore, there is always one available NameNode in the cluster, which achieves the purpose of high availability of NameNode.

Zookeeper election mechanism.

The Zookeeper cluster also ensures high availability. Each machine in the Zookeeper cluster is divided into two roles: Leader and Follower. When data is written to the Zookeeper cluster, the Leader writes data first. Then notify the Follower to write. When a client reads a number, it can read from any machine because the data is the same.

In this case, the Leader role has the potential of single point of failure, and high availability is to solve the potential of single point of failure.

Zookeeper provides a mechanism for solving the single point of failure of the Leader. The Leader is elected and the machine is not fixed.

The election process is that when any machine in the cluster finds that there is no Leader in the cluster, it recommends itself to be the Leader, and other machines agree. When more than half of the machines agree to be the Leader, the election ends, so the machine data in the Zookeeper cluster must be odd.

In this way, even if the Leader machine is down, a new Leader is quickly elected to ensure the high availability of the Zookeeper cluster.

Write high availability.

In a cluster, write operations are notified to the Leader, who then notifies the Follower to write. In fact, if more than half of the machines write successfully, the write is considered successful. Therefore, even if some machines break down, the write is still successful.

Read high availability.

When the Zookeeperk client reads data, it can read data from any machine in the cluster. So the outage of some machines does not affect reading.

The Number of ZooKeeper servers must be an odd number because ZooKeeper has an election system. Zookeeper plays the following roles: leader, follower, and observer to ensure data consistency in the cluster.

Install ZooKeeper

Here we install zooKeeper cluster on BigData01, BigData02 and BigData03 machines.

1. Decompress the installation package

Install the zooKeeper installation package on BigData01.

2. Modify the configuration

Copy the zoo_sample. CFG file from the conf file and rename it zoo.cfg. Zoo. CFG is the zooKeeper configuration file:

DataDir Sets the directory for storing zooKeeper data files.

DataDir = / opt/modules/zookeeper – 3.4.8 / data/zData

Specify the information about each machine in the ZooKeeper cluster:

The number after server ranges from 1 to 255, so a ZooKeeper cluster can have up to 255 machines.

3. Create myID file

Create a file named myID in the directory specified by dataDir with the number after the server point.

4. Distribute to other machines

5. Modify the myID file on other machines

6. Start the zookeeper

Zookeeper needs to be started on each machine.

Zookeeper command

Enter the zookeeper Shell

Run bin/ zkcli. sh in the zooKeeper root directory to enter the ZK shell mode.

Zookeeper is like a small file system. / is the root directory, and all the nodes below are called ZNodes.

Enter any character after entering the ZK shell to list all zooKeeper commands

To query data on the zNode: get/ZooKeeper

Create a zNode: create /znode1 “demodata”

Lists all child ZNodes: ls /

Delete znode: RMR /znode1

Exit the shell mode: quit

Step 11 Hadoop 2.x HDFS HA deployment

39. HDFS HA principle

Defects of a single NameNode cause a single point of failure. If the NameNode is unavailable, the entire HDFS file system becomes unavailable.

Therefore, you need to design a highly available HDFS (Hadoop HA) to solve the NameNode single point of failure. To solve this problem, configure multiple NameNode nodes in the HDFS cluster.

But once multiple Namenodes are introduced, there are some issues that need to be resolved.

  • HDFS HA must meet the following requirements:

    • Ensure consistency of metadata data in NameNode memory and security of edit log files.

    • How do multiple Namenodes collaborate

    • How the client can correctly access the available NameNode.

    • How to ensure that only one NameNode is in service at any time?

  • The solution

    • To ensure NameNode metadata consistency and edit log security, Zookeeper is used to store edit log files.

    • One NameNode is in Active state and the other is in Standby state. Only one NameNode in Active state can provide services at a point in time. Metadata stored on the two Namenodes is synchronized in real time. When the Active NameNode is faulty, Zookeeper switches to the Standby NameNode in real time and changes the Standby state to Active.

    • The client connects to a Zookeeper agent to determine which NameNode is in service at the time.

40. HDFS HA architecture diagram

  • There are two NameNode nodes in the HDFS HA architecture. One is Active to provide services for clients, and the other is Standby.

  • The metadata file has two files: fsimage and edits, which are used to back up the metadata. JournalNode is used to copy edits files from Active NameNode in real time. There are three JournalNodes for high availability.

  • Standby NameNode does not provide metadata access. It copies fsimage files from Active NameNode, edits files from JournalNode, and merges fsimage and EDits files. Equivalent to SecondaryNameNode.

    The final goal is to ensure that the metadata information on the Standby NameNode is consistent with that on the Active NameNode to implement hot backup.

  • Zookeeper ensures that the Standby NameNode is changed to Active when the Active NameNode fails.

  • ZKFC (Failure Detection and Control) is a Zookeeper client in Hadoop. It starts a ZKFC process on each NameNode node to monitor the status of NameNode. In addition, the status information of NameNode is reported to the Zookeeper cluster. In fact, a Znode is created on Zookeeper, and the status information of NameNode is stored in the node.

    When NameNode fails, ZKFC detects and reports to Zookeeper, which deletes the corresponding Znode. When Standby ZKFC finds no Active NameNode, The shell command will be used to change the NameNode it monitors to Active state and modify the data on Znode. Znode is a temporary node. A temporary node will delete Znode when the client is disconnected. Therefore, when ZKFC fails, NameNode will also be switched.

  • Datanodes send heartbeat information and Block report information to two Namenodes at the same time. Datanodes only receive file read/write operations from the Active NameNode.

41. Build HDFS HA environment

1. Plan server roles

Create directory /opt/modules/hadoopha/ on bigData01, BigData02, and BigData03 respectively to store Hadoop HA environment.

2. Create the HDFS HA Hadoop program directory

3. Decompress Hadoop 2.5.0

4. Configure the Hadoop JDK path

5. Configuration HDFS – site. XML

<? The XML version = "1.0" encoding = "utf-8"? ><configuration> <property> <! -- Define a services name for the Namenode cluster --> <name>dfs.nameservices</name> <value> NS1 </value> </property> <property> -- Which namenodes are included in nameservice, Namenodes. Ns1 </name> <value>nn1,nn2</value> </property> <property> <! The RPC address and port number of the namenode named nn1, RPC is used and the datanode communications - > < name > DFS. The namenode. RPC - address. Ns1. Nn1 < / name > < value > bigdata-senior01.chybinmy.com: 8020 < / value > </property> <property> <! -- RPC address and port number of namenode named nn2, RPC is used and the datanode communications - > < name > DFS. The namenode. RPC - address. Ns1. Nn2 < / name > < value > bigdata-senior02.chybinmy.com: 8020 < / value > </property> <property> <! The HTTP address and port number of the namenode named nn1, Web client - > < name > DFS. The namenode. HTTP - address. Ns1. Nn1 < / name > < value > bigdata-senior01.chybinmy.com: 50070 < value > / < / property >  <property> <! -- HTTP address and port number of namenode named nn2, Web client - > < name > DFS. The namenode. HTTP - address. Ns1. Nn2 < / name > < value > bigdata-senior02.chybinmy.com: 50070 < value > / < / property >  <property> <! - used to share between the namenode edit log journal node list - > < name > DFS. The namenode. Shared. The edits. Dir < / name > <value>qjournal://bigdata-senior01.chybinmy.com:8485; bigdata-senior02.chybinmy.com:8485; bigdata-senior03.chybinmy.com:8485/ns1</value> </property> <property> <! - log is used to store the edits journalnode directory - > < name > DFS. Journalnode. Edits. Dir < / name > < value > / opt/modules/hadoopha/hadoop - 2.5.0 / TMP/data/DFS/Jacqueline Nottingham < value > / < / property > < property > <! - the client connection available NameNode USES proxy class - > < name >. DFS client. Failover. Proxy. The provider. The ns1 < / name > <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <! -- --> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property></configuration>Copy the code

6. The core configuration – site. The XML

<? The XML version = "1.0" encoding = "utf-8"? ><configuration> <property> <! <name>fs.defaultFS</name> <value> HDFS ://ns1</value> </property> <property> -- -- -- > < name > hadoop. The TMP. Dir < / name > < value > / opt/modules/hadoopha/hadoop - 2.5.0 / data/TMP value > < / </property></configuration>Copy the code

Hadoop.tmp. dir Sets the address of the hadoop temporary directory. By default, NameNode and DataNode data are stored in this directory.

7. Configure slaves file

8. Distribute the data to other nodes

Delete the share/doc directory before distributing it. This directory contains the help files and is large enough to be deleted.

9. Start the HDFS HA cluster

Journalnode is started separately on all three machines.

The JPS command is used to check whether the service is started.

10. Start the Zookeeper

Start Zookeeper on the three nodes:

11. Format NameNode

NameNode formatting on the first platform:

On the second NameNode:

12. Start the NameNode

Start NameNode on the first and second platforms:

On the HDFS Web page, both Namenodes are standby.

Switch the first VM to the Active state:

The forcemanual parameter can be added to force a NameNode to become active.

On the Web page, you can see that the first vm is in the active state.

13. Configure automatic failover

The ZooKeeper cluster is used to implement failover. Before configuring failover, shut down the cluster and do not configure failover when the HDFS is running.

None example Disable NameNode, DataNode, JournalNode, and ZooKeeper

Modify the HDFS – site. XML

Modify the core – site. XML

Distribute hdFS-site. XML and core-site. XML to other machines

Start the zookeeper

Zookeeper is started on three machines

Create a zNode

Create a node on Zookeeper to store namenode-related data.

14. Start HDFS, JournalNode, and ZKFC

Start NameNode, DataNode, JournalNode, and ZKFC

ZKFC only listens for NameNode.

Test HDFS HA

1. Test whether failover and data are shared

Upload files on NN1

Currently, NameNode on Bigdata-Senior01 is Active.

Kill NodeNode process on NN1

[hadoop@bigdata-senior01 hadoop-2.5.0]$kill -9 3364

Copy the code

The namenode on NN1 is no longer accessible.

Check whether NN2 is in Active state

Check to see if you see the file on Nn2

After the verification, file synchronization and automatic failover between NN1 and NN2 have been implemented.

X YARN HA deployment

43. YARN HA Principle

Before Hadoop2.4, ResourceManager also has single point of failure. HA must be implemented to ensure high availability of ResourceManger.

ResouceManager records the resource allocation and JOB running status of the current cluster. YRAN HA uses shared storage media such as Zookeeper to store the information to achieve high availability. Zookeeper is used to implement automatic ResourceManager failover.

  • MasterHADaemon: Controls the start and stop of RM Master. It runs in the same process as RM and can receive external RPC commands.

  • Shared storage: The Active Master writes information to the shared storage, and the Standby Master reads information from the shared storage to synchronize with the Active Master.

  • ZKFailoverController: The switch controller based on Zookeeper is composed of ActiveStandbyElector and HealthMonitor. ActiveStandbyElector is responsible for interacting with Zookeeper. Check whether the managed Master is Active or Standby. HealthMonitor is a monitor that monitors the active health of the Master.

  • Zookeeper: Maintains a global lock and controls only one Active ResourceManager in the cluster.

Set up YARN HA environment

1. Plan server roles

2. Modify the configuration file yarn-site. XML

<? The XML version = "1.0" encoding = "utf-8"? ><configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>106800</value> </property> <property> <! - enable the resourcemanager ha function -- -- > < name > yarn. The resourcemanager. Ha. Enabled < / name > < value > true < / value > < / property > < property > <! - a id for resourcemanage ha cluster - > < name > yarn. The resourcemanager. Cluster - id < / name > < value > yarn - cluster value > < / < / property > <property> <! - specifies what resourcemanger ha node name - > < name > yarn. The resourcemanager. Ha. The rm - ids < / name > < value > cost rm12, rm13 < value > / < / property > <property> <! - specifies the first node machine - > < name > yarn. The resourcemanager. The hostname. Cost rm12 < / name > < value > bigdata-senior02.chybinmy.com < / value > </property> <property> <! - specifies the second node machine - > < name > yarn. The resourcemanager. The hostname. Rm13 < / name > < value > bigdata-senior03.chybinmy.com < / value > </property> <property> <! -- Specify the ZooKeeper node used by resourcemanger HA --> <name>yarn.resourcemanager.zk-address</name> <value>bigdata-senior01.chybinmy.com:2181,bigdata-senior02.chybinmy.com:2181,bigdata-senior03.chybinmy.com:2181</value> </property> <property> <! -- --> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <! -- --> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property></configuration>Copy the code

3. Distribute to other machines

4. Start

Start YARN on bigdata-senior01:

Start Resourcemanager on Bigdata-senior02 and Bigdata-senior03.

Start the process on each node.

The Web client can access Resourcemanager on BigData02 properly, and the status is active.

http://bigdata-senior02.chybinmy.com:8088/cluster

If you access another Resourcemanager, it automatically switches to the active Resourcemanager because the other Resourcemanager is standby.

http://bigdata-senior03.chybinmy.com:8088/cluster

45. Test YARN HA

5. Run a MapReduce job

6. When the job is running, kill the Resourcemanager process in the Active state.

7. Check whether the other Resourcemanager can take over automatically.

The Resourcemanage Web client of BigData02 cannot be accessed, and the Resourcemanage of BigData03 automatically becomes active.

8. Check whether the job can be successfully completed.

In addition, MapReduce jobs can be successfully completed without being affected by resourcemanager faults.

After the preceding tests, YARN HA is successfully set up.

Step 13 Deploy the HDFS Federation architecture

Reasons for using HDFS Federation

1. Limitations of a single NameNode node

Namespace limitations.

NameNode stores metadata of files in the entire HDFS. NameNode is deployed on a single machine. The hardware of a single machine limits the number of files that NameNode can manage, limiting the growth of data volume.

Data isolation issues.

All files in the HDFS are managed by a NameNode, so a program may affect the entire HDFS program, and permission control is complicated.

Performance bottlenecks.

The HDFS throughput of a single NameNode is limited by the throughput of a single NameNode. Because NameNode is a JVM process, performance degrades significantly when a JVM process consumes a lot of memory.

2. HDFS Federation

HDFS Federation can set multiple Namenodes in Hadoop cluster, which is different from HA namenodes. Namenodes in HDFS Federation are different. It can be understood that a NameNode is segmented into multiple Namenodes, and each NameNode is only responsible for managing part of the data. Multiple Namenodes in the HDFS Federation share datanodes.

Structure diagram of HDFS Federation

HDFS Federation setup

1. Plan server roles

2. Create HDFS Federation version Hadoop program directory

On bigdata01 create the directory/opt/modules/hadoopfederation/used to store the Hadoop environment Federation.

3. Decompress Hadoop 2.5.0

4. Configure the Hadoop JDK path

Change JDK paths in hadoop-env.sh, mapred-env.sh, and yarn-env.sh files.

Export JAVA_HOME = “/ opt/modules/jdk1.7.0 _67”

5. Configuration HDFS – site. XML

<configuration><property><! </name> <value> NS1, NS2,ns3</value> </property> <property><! -- Machine name and RPC port of the first NameNode, --> <name>dfs.namenode.rpc-address.ns1</name> <value>bigdata-senior01.chybinmy.com:8020</value> </property> <property><! -- Machine name and RPC port of the first NameNode, Alternate port - > < name > DFS. The namenode. Serviceerpc - address. Ns1 < / name > < value > bigdata-senior01.chybinmy.com: 8022 < / value > </property> <property><! <name> DFS. Namenode.http-address.ns1 </name> <value>bigdata-senior01.chybinmy.com:50070</value> </property><property><! <name> DFS. Namenode.https -address.ns1</name> <value>bigdata-senior01.chybinmy.com:50470</value> </property> <property> <name>dfs.namenode.rpc-address.ns2</name> <value>bigdata-senior02.chybinmy.com:8020</value> </property> <property> <name>dfs.namenode.serviceerpc-address.ns2</name> <value>bigdata-senior02.chybinmy.com:8022</value> </property> <property> <name>dfs.namenode.http-address.ns2</name> <value>bigdata-senior02.chybinmy.com:50070</value> </property> <property> <name>dfs.namenode.https-address.ns2</name> <value>bigdata-senior02.chybinmy.com:50470</value> </property> <property> <name>dfs.namenode.rpc-address.ns3</name> <value>bigdata-senior03.chybinmy.com:8020</value> </property> <property> <name>dfs.namenode.serviceerpc-address.ns3</name> <value>bigdata-senior03.chybinmy.com:8022</value> </property> <property> <name>dfs.namenode.http-address.ns3</name> <value>bigdata-senior03.chybinmy.com:50070</value> </property> <property> <name>dfs.namenode.https-address.ns3</name> <value>bigdata-senior03.chybinmy.com:50470</value> </property></configuration>Copy the code

6. The core configuration – site. The XML

<configuration><property> <name>hadoop.tmp.dir</name> < value > / opt/modules/hadoopha/hadoop - 2.5.0 / data/TMP < value > / < / property > < / configuration >Copy the code

Hadoop.tmp. dir Sets the address of the hadoop temporary directory. By default, NameNode and DataNode data are stored in this directory.

7. Configure slaves file

8. Configuration yarn – site. XML

<configuration><property>    <name>yarn.nodemanager.aux-services</name>    <value>mapreduce_shuffle</value> </property>     <property>    <name>yarn.resourcemanager.hostname</name>    <value>bigdata-senior02.chybinmy.com</value> </property>     <property>    <name>yarn.log-aggregation-enable</name>    <value>true</value> </property>     <property>    <name>yarn.log-aggregation.retain-seconds</name>    <value>106800</value> </property>     </configuration>

Copy the code

9. Distribute the data to other nodes

Delete the share/doc directory before distributing it. This directory contains the help files and is large enough to be deleted.

10. Format NameNode

Do the NameNode formatting on the first platform.

A clusterId must be specified so that the clusterId of multiple namenodes is the same, because the three namenodes are in the same cluster. The clusterId is hadoop-federation-clusterid.

On the second NameNode.

It’s on the third NameNode.

11. Start the NameNode

Start NameNode on the first, second, and third machines:

After the startup, run the JPS command to check whether the startup is successful.

On the HDFS Web page, the three Namenodes are in standby state.

12. Start the DataNode

After the DataNode process is started, run the JPS command to verify that the DataNode process is successfully started.

Test HDFS Federation

1. Modify the core – site. XML

On bigdata-senior01, modify the core-site. XML file to specify that the connected NameNode is the first NameNode.

[hadoop @ bigdata – senior01 hadoop – 2.5.0] $vim etc/hadoop/core – site. XML

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://bigdata-senior01.chybinmy.com:8020</value> </property><property> <name>hadoop.tmp.dir</name> < value > / opt/modules/hadoopfederation/hadoop - 2.5.0 / data/TMP < value > / < / property > < / configuration >Copy the code

2. Upload a file to HDFS in Bigdate-senior01

3. View the HDFS file

Bigdate-senior01; bigDate-Senior01; bigDate-Senior01; bigdate-Senior01

In this way, the HDFS client can specify which NameNode to upload to achieve the purpose of NameNode division.

Afterword.

The operation procedure of this article is not the standard operation process in work. If you install it on hundreds of machines like this, you will be exhausted. I hope readers can learn the components and assistance process of Hadoop through the installation step by step in this article, which is of great help to the in-depth use of Hadoop.

“Read the transcript” to view the chat transcript