Centos 8 is used in this environment

HOSTNAME role IP role
hadoop301 hdfs namenode,hdfs datanode, yarn resourcemanager, yarn node manager, journal node, zookeeeper 192.168.142.101 yarn and hdfs master, worker
hadoop302 hdfs namenode, hdfs datanode, yarn resourcemanager, yarn node manager, journal node, zookeeeper 192.168.142.102 yarn and hdfs master, worker
hadoop303 hdfs namenode, hdfs datanode, yarn resourcemanager, journal node, zookeeeper 192.168.142.103 hdfs master , worker

# preconditions

  • Set up Linux general Settings first
  • Configure password-free login between the three machines
  • Download the hadoop-3.2.1 and Zookeeper-3.5.6 packages

Installation steps


1 Execute on all machines

Install the JDK

  • Yum install -y java-1.8.0-openJDk-devel.x86_64 java-1.8.0-openJDk.x86_64
  • 1.2 Edit /etc/profile and add it at the end
Export HADOOP_HOME=/opt/hadoop-3.2.1 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export JAVA_HOME=/usr/lib/ JVM /jre-1.8.0 export ZOOKEEPER_HOME=/opt/zookeeper-3.5.6 export PATH=$PATH:$ZOOKEEPER_HOME/binCopy the code
  • 1.3 Creating folders. Why these paths are created depends on the following configuration files.
mkdir -p /tmp/hadoop/tmpdir
mkdir -p /tmp/hadoop/journalnode/data
mkdir -p /tmp/hadoop/hdfs/namenode
mkdir -p /tmp/hadoop/hdfs/datanode
mkdir -p /tmp/zookeeper
echo > 1 /tmp/zookeeper/myid #hadoop301
echo > 2 /tmp/zookeeper/myid #hadoop302
echo > 3 /tmp/zookeeper/myid #hadoop303
Copy the code
  • 1.4 Setting hosts
192.168.142.101 hadoop301
192.168.142.102 hadoop302
192.168.142.103 hadoop303
Copy the code

2 Run the command on hadoop301

2.1 installation ZK

2.1.1 Decompressing the ZK

The tar - ZXF zookeeper - 3.5.6. Tar. GzCopy the code

2.1.2 configuration zoo. CFG

CD zookeeper-3.5.6/conf mv zoo_sample. CFG zoo. CFG vim zoo.cfgCopy the code

Zoo.cfg reads as follows

Milliseconds of each tick is an integer multiple of that amount of time.
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take 
The number of # tickTime indicates the time required for the leader to synchronize with the followers after the leader election. If the followers are too many or the leader data is too low, the synchronization time may increase correspondingly, so this value should be increased accordingly. Of course, this value is also the maximum wait time (setSoTimeout) for the followers and observers to start synchronizing the leader's data.
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
# number of tickTime, which is easily confused with the above time. It also indicates the maximum wait time of the followers and observers interacting with the leader, but it is the timeout time of the normal request forwarding or ping interaction after the synchronization with the leader
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
DataLogDir = dataLogDir = dataLogDir = dataLogDir = dataLogDir = dataLogDir = dataLogDir = dataLogDir
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=hadoop301:2888:3888
server.2=hadoop302:2888:3888
server.3=hadoop303:2888:3888
Copy the code

2.1.3 Synchronization to hadoOP302 haDOOP303

Yum install -y rsync rsync-auvp /opt/zookeeper-3.5.6 root@hadoop302:/opt rsync-auvp /opt/zookeeper-3.5.6 root@hadoop303:/optCopy the code

2.2 install hadoop

  • 2.2.1 Upload hadoop-3.2.1. Tar. gz to the /opt directory and decompress it
  • 2.2.2 editor/opt/hadoop – 3.2.1 / etc/hadoop/core – site. XML
<?xml version="1.0" encoding="UTF-8"? >
<?xml-stylesheet type="text/xsl" href="configuration.xsl"? >
<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/tmp/hadoop/tmpdir</value>
  </property>
  <property>
    <name>ha.zookeeper.quorum</name>
    <value>hadoop301:2181,hadoop302:2181,hadoop303:2181</value>
  </property>
</configuration>
Copy the code
  • Then edit/opt/hadoop – 3.2.1 / etc/hadoop/HDFS – site. XML
<?xml version="1.0" encoding="UTF-8"? >
<?xml-stylesheet type="text/xsl" href="configuration.xsl"? >

<configuration>
  <! -- hdfs HA configuration-->
  <! -- all default configuration can be found at https://hadoop.apache.org/docs/stable|<can be a version liek R3.2.1 > < can > / hadoop - project - dist/hadoop - HDFS / / HDFS - default. XML - >
  
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <! -- dfs.nameservices -- fs.nameservices -- fs.defaultfs --
  <property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
  </property>
  <! Nn1,nn2,nn3-->
  <property>
    <name>dfs.ha.namenodes.mycluster</name>
    <value>nn1,nn2,nn3</value>
  </property>
  <! - the namenode nn1 specific definition, here to DFS. Ha.. Namenodes mycluster define a list of corresponding -- -- >
  <property>
    <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    <value>hadoop301:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    <value>hadoop302:8020</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.mycluster.nn3</name>
    <value>hadoop303:8020</value>
  </property>
  <! - the namenode nn1 specific definition, here to DFS. Ha.. Namenodes mycluster define a list of corresponding -- -- >
  <property>
    <name>dfs.namenode.http-address.mycluster.nn1</name>
    <value>hadoop301:9870</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.mycluster.nn2</name>
    <value>hadoop302:9870</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.mycluster.nn3</name>
    <value>hadoop303:9870</value>
  </property>
<! JournalNode where NameNode metadata is stored -->
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://hadoop301:8485; hadoop302:8485; hadoop303:8485/mycluster</value>
  </property>
  <! -- Specify the location where JournalNode stores data on the local disk -->
  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/tmp/hadoop/journalnode/data</value>
  </property>
  <! -- Automatic switch implementation mode when the configuration fails -->
  <property>
    <name>dfs.client.failover.proxy.provider.mycluster</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <! -- Configure the isolation mechanism method, multiple mechanisms with a line break, i.e. each mechanism with a temporary line -->
  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
  </property>
  <! -- Sshfence isolation requires SSH login free -->
  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
  </property>
  <! Sshfence isolation timeout -->
  <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
  </property>
  <property>
    <name>dfs.journalnode.http-address</name>
    <value>0.0.0.0:8480</value>
  </property>
  <property>
    <name>dfs.journalnode.rpc-address</name>
    <value>0.0.0.0:8485</value>
  </property>
  <! -- hdfs HA configuration end-->

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/tmp/hadoop/hdfs/namenode</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/tmp/hadoop/hdfs/datanode</value>
  </property>
  <! -- Enable webHDFS interface access -->
  <property>
    <name>dfs.webhdfs.enabled</name>
    <value>true</value>
  </property>
  <! -- Disable permission authentication, hive can connect directly -->
  <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
  </property>
</configuration>
Copy the code

2.2.4 editor/opt/hadoop – 3.2.1 / etc/hadoop/yarn – site. XML

<?xml version="1.0"? >
<configuration>

  <! -- yarn ha configuration-->
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <! Define cluster name -->
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>cluster1</value>
  </property>
  <! - define the machine in the high availability cluster id with yarn. The resourcemanager. Ha. The rm - ids define the value of the corresponding, if not as a resource manager is to delete the configuration. -->
  <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>rm1</value>
  </property>
  <! Define the list of IDS in the HA cluster -->
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <! -- Define the machines in the high availability RM cluster.
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>hadoop301</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>hadoop302</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>hadoop301:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>hadoop302:8088</value>
  </property>
  <property>
    <name>hadoop.zk.address</name>
    <value>hadoop301:2181,hadoop302:2181,hadoop303:2181</value>
  </property>

  <! -- Site specific YARN configuration properties -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>
Copy the code

2.2.5 editor/opt/hadoop – 3.2.1 / etc/hadoop/mapred – site. XML

<?xml version="1.0"? >
<?xml-stylesheet type="text/xsl" href="configuration.xsl"? >
<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>The/opt/hadoop - 3.2.1 / share/hadoop/common / *, / opt/hadoop - 3.2.1 / share/hadoop/common/lib / *, The/opt/hadoop - 3.2.1 / share/hadoop/HDFS / *, / opt/hadoop - 3.2.1 / share/hadoop/HDFS/lib / *, The/opt/hadoop - 3.2.1 / share/hadoop/graphs / *, / opt/hadoop - 3.2.1 / share/hadoop/graphs/lib / *, The/opt/hadoop - 3.2.1 / share/hadoop/yarn / *, / opt/hadoop - 3.2.1 / share/hadoop/yarn/lib / *</value>
  </property>

</configuration>
Copy the code

2.2.6 editor/opt/hadoop – 3.2.1 / etc/hadoop/hadoop – env. Sh

# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
# export JAVA_HOME=
exportJAVA_HOME = / usr/lib/JVM/jre - 1.8.0 comes with# Some parts of the shell code may do special things dependent upon
# the operating system. We have to set this here. See the next
# section as to why....
export HADOOP_OS_TYPE=${HADOOP_OS_TYPE:-$(uname -s)}
exportHADOOP_PID_DIR = / opt/hadoop - 3.2.1 / pidexport HADOOP_LOG_DIR=/var/log/hadoop
Copy the code

2.2.7 editor/opt/hadoop – 3.2.1 / etc/hadoop/yarn – env. Sh

# Specify the max heapsize for the ResourceManager. If no units are
# given, it will be assumed to be in MB.
# This value will be overridden by an Xmx setting specified in either
# HADOOP_OPTS and/or YARN_RESOURCEMANAGER_OPTS.
# Default is the same as HADOOP_HEAPSIZE_MAX
#export YARN_RESOURCEMANAGER_HEAPSIZE=
exportJAVA_HOME = / usr/lib/JVM/jre - 1.8.0 comes withCopy the code

2.2.7 Edit /opt/hadoop-3.2.1/sbin/start-dfs.sh, /opt/hadoop-3.2.1/sbin/stop-dfs.sh at the beginning of the script

HDFS_NAMENODE_USER=root
HDFS_DATANODE_USER=root
HDFS_JOURNALNODE_USER=root
HDFS_ZKFC_USER=root
Copy the code

2.2.8 Edit /opt/hadoop-3.2.1/sbin/start-yarn.sh, /opt/hadoop-3.2.1/sbin/stop-yarn.sh at the beginning of the script

YARN_RESOURCEMANAGER_USER=root
YARN_NODEMANAGER_USER=root
Copy the code

2.2.9 Modify /opt/hadoop-3.2.1/etc/hadoo/workers as follows

hadoop301
hadoop302
hadoop303
Copy the code

2.2.10 Copy Hadoop-3.2.1 to hadoop302 hadoop303

Rsync-auvp /opt/hadoop-3.2.1 root@hadoop302:/opt rsync-auvp /opt/hadoop-3.2.1 root@hadoop303:/optCopy the code

3 Run the command on hadoop302

Need to modify the yarn – site. XML yarn. The resourcemanager. Ha. The id, the following instead

  <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>rm2</value>
  </property>
Copy the code

4 Run the command on hadoOP303

Delete the following property

  <property>
    <name>yarn.resourcemanager.ha.id</name>
    <value>rm1</value>
  </property>
Copy the code

5 to start

Start sequence Zookeeper >JournalNode > Format NameNode > Create namespace ZKFS >NameNode >Datanode >ResourceManager >NodeManager

5.1 start the zookeeper

Execute on all machine rows in the order hadoOP301 hadoOP302 hadoOP303

Note that you need to switch back to bash if you use ZSH
#chsh -s /usr/bin/bash
Emualte command must be installed with oh my ZSH.
Emulate sh -c '/opt/zookepper-3.5.6/bin/ zkserver. sh start/ opt/zookepper 3.5.6 / bin/zkServer. Sh start/opt/zookepper 3.5.6 / bin/zkServer. Sh statusCopy the code

5.2 start journalnode

Execute on all machine rows in the order hadoOP301 hadoOP302 hadoOP303

#Note that you need to switch back to bash if you use ZSH
#chsh -s /usr/bin/bash/ opt/hadoop - 3.2.1 / sbin/hadoop - daemon. Sh start journalnodeCopy the code

5.3 Formatting Namenode

Execute on hadoop301

#Note that you need to switch back to bash if you use ZSH
#chsh -s /usr/bin/bashThe/opt/hadoop - 3.2.1 / bin/hadoop namenode - format#Synchronize formatted metadata to other Namenodes, otherwise it may not work
rsync -auvp /tmp/hadoop/hdfs/namenode/current root@hadoop302:/tmp/hadoop/hdfs/namenode
rsync -auvp /tmp/hadoop/hdfs/namenode/current root@hadoop303:/tmp/hadoop/hdfs/namenode
#Formatting ZK
hdfs zkfs -formatZK
Copy the code

5.4 stop jounalnode

Execute on all machines

/ opt/hadoop - 3.2.1 / sbin/hadoop - daemon. Sh stop journalnodeCopy the code

5.5 start the hadoop

Execute on hadoop301

#It must be executed under bash and not in ZSH compatible mode
start-dfs.sh
start-yarn.sh
Copy the code

Reference 1

Resources 2