Author : Ali0th

Date : 2019-4-22

After deploying Standalone Hadoop in the previous article, I tried to deploy a Hadoop cluster. A Hadoop cluster requires at least three machines because the number of HDFS duplicates is at least three. Here we use four machines to build.

This article is exhaustive, including all steps and problem solving. Visible directory, with some fixes for bugs encountered. You can see how I stepped through the hole.

Catalogue of 0.1.

[TOC]

0.2. The environment

CentOS release 6.4
openjdk version "1.8.0 comes with _201"
Copy the code

0.3. Cluster Devices

Prepare four VMS, one master VM, and three slavers. Master serves as NameNode, DataNode, ResourceManager, and NodeManager, and slave serves as DataNode and NodeManager.

Master: 192.168.192.164 SlavE1:192.168.192.165 SlavE2:192.168.192.167 Slave3:192.168.192.166Copy the code

0.4. The hostname configuration

Because Hadoop clusters sometimes need to communicate with each other by hostname, we need to ensure that each machine has a different hostname.

vim /etc/hosts  All four machines need to be operated

192.168.192.164 master
192.168.192.165 slave1
192.168.192.167 slave2
192.168.192.166 slave3

reboot # to restart

hostname Check host name
Copy the code

/etc/hosts/etc/sysconfig/networkThe difference between

0.5. Configure environment variables

vim /etc/profile

#java
exportJAVA_HOME = / usr/lib/JVM/Java -- 1.8.0 comes with its - 1.8.0.201. B09-2. El6_10. X86_64export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Copy the code

0.6. SSH Password-free login

All four machines perform the following operations

ssh-keygen -t rsa -P ' ' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Copy the code

Do something on each machine to copy the public key to the other machines

ssh-copy-id -i ~/.ssh/id_rsa.pub master
ssh-copy-id -i ~/.ssh/id_rsa.pub slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub slave2
ssh-copy-id -i ~/.ssh/id_rsa.pub slave3
Copy the code

Test configuration successful

[mt@slave3 ~]$ ssh master Last login: Tue Apr 16 17:51:47 2019 from slave2 [mt@master ~]$ ssh slave3 Last login: Tue Apr 16 17:32:12 2019 from 192.168.192.1 [mt@slave3 ~]$SSH slave2 Last login: Tue Apr 16 17:51:42 2019 from master [mt@slave2 ~]$ ssh slave3 Last login: Tue Apr 16 17:53:08 2019 from master [mt@slave3 ~]$Copy the code

0.7. Configure the time step

For details, see Hadoop Cluster Time Synchronization

The trial here uses Aliyun time

ntpdate ntp1.aliyun.com
Copy the code

0.8. Hadoop profile

Six files in /opt/hadoop/hadoop-3.1.0/etc/hadoop/ need to be configured

Hadoop-env. sh, core-site. XML, hdFS-site. XML, yarn-site. XML, mapred-site. XML, workers

cd $HADOOP_HOME cd etc/hadoop

0.8.1. Creating a Data Store Directory

  1. The NameNode data storage directory: / usr/local/data/hadoop/name
  2. SecondaryNameNode data storage directory: / usr/local/data/hadoop/secondary
  3. The DataNode data storage directory: / usr/local/data/hadoop/data
  4. Temporary data storage directory: / usr/local/data/hadoop/TMP
  5. HADOOP_MAPRED_HOME :
sudo mkdir -p /usr/local/data/hadoop/name
sudo mkdir -p /usr/local/data/hadoop/secondary
sudo mkdir -p /usr/local/data/hadoop/data
sudo mkdir -p /usr/local/data/hadoop/tmp
Copy the code

0.8.2. Configure hadoop – env. Sh

exportJAVA_HOME = / usr/lib/JVM/Java -- 1.8.0 comes with its - 1.8.0.201. B09-2. El6_10. X86_64export HDFS_NAMENODE_USER="root"
export HDFS_DATANODE_USER="root"
export HDFS_SECONDARYNAMENODE_USER="root"
export YARN_RESOURCEMANAGER_USER="root"
export YARN_NODEMANAGER_USER="root"
Copy the code

HDFS 0.8.3. Core – site. XML – site. XML

Then edit the core-site. XML and hdFS-site. XML configuration files respectively

vim core-site.xml   # Add the following content
Copy the code

Hadoop configuration file details (1) -core-site.xml

You need to configure name,tmp.dir

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://master:8020</value>
        <description>Specify the default access address and port number</description>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/data/</value>
        <description>Parent directories of other temporary directories that are used by other temporary directories</description>
    </property>
    <property>
         <name>io.file.buffer.size</name>
         <value>131072</value>
        <description>The size of the buffer used in the sequence</description>
    </property>
</configuration>
Copy the code

Vim HDFS -site. XML # add the following content

Replication, Namenode, Datanode, and Web need to be configured.

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>4</value>
        <description>Number of copies: number of backups stored in HDFS</description>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/usr/local/data/hadoop/name</value>
        <description>Namenode Directory where temporary files are stored</description>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/usr/local/data/hadoop/data</value>
        <description>Datanode Directory for storing temporary files</description>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>master:50070</value> 
        <description>HDFS web address</description>
    </property>
</configuration>
Copy the code

Edit the yarn-site. XML configuration file

Vim yarn-site. XML # Add the following content

You need to configure the data acquisition mode, master address, enable yarn external access, disable memory detection (required by VMS), and environment variables that the container may overwrite.

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>The nomenodeManager obtains data in shuffle mode</description>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
        <description>Specify the address of Yarn's ResourceManager</description>
    </property>

    <property>
          <name>yarn.resourcemanager.webapp.address</name>
          <value>192.168.192.164:8088</value>
        <description>Configure yarn external access, (External IP address: port)</description>
    </property>

    <property>
        <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,P ATH,LANG,TZ</value>
        <description>The container may overwrite environment variables instead of using the default values of NodeManager</description>
    </property>

    <property>
       <name>yarn.nodemanager.vmem-check-enabled</name>
       <value>false</value>
        <description>Disable memory detection. If the VM needs it, an error will be reported</description>
    </property>

</configuration>
Copy the code

0.8.5. Copy and edit the MapReduce configuration file

Template mapred-site. XML vim mapred-site. XML # add the following content

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>Tell Hadoop that MR(Map/Reduce) runs on YARN</description>
    </property>
    
   <property>
        <name>mapreduce.admin.user.env</name>
        <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
        <description>You can set environment variables on the AM AppMaster. If the environment variables are not configured, MapReduce may fail</description>
   </property>

   <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
        <description>You can set environment variables on the AM AppMaster. If the environment variables are not configured, MapReduce may fail</description>
   </property>
</configuration>
Copy the code

0.8.6. Configure the host name of the secondary node

Finally, configure the host name of the slave node. If no host name is specified, use IP:

vim workers # Hadoop3.0 has changed its name to Workers
slave1
slave2
slave3
Copy the code

0.8.7. Configure distribution

Hadoop distribution and its configuration:

rsync -av /usr/local/hadoop slave1:/usr/local/hadoop
rsync -av /usr/local/hadoop slave2:/usr/local/hadoop
rsync -av /usr/local/hadoop slave3:/usr/local/hadoop
rsync -av ~/.bash_profile slave1:~/.bash_profile
rsync -av ~/.bash_profile slave2:~/.bash_profile
rsync -av ~/.bash_profile slave3:~/.bash_profile
Copy the code

Here I have the same Hadoop installed on each machine, so just distribute the configuration file:

rsync -av /usr/local/hadoop/etc/hadoop/* slave1:/usr/local/hadoop/etc/hadoop
rsync -av /usr/local/hadoop/etc/hadoop/* slave2:/usr/local/hadoop/etc/hadoop
rsync -av /usr/local/hadoop/etc/hadoop/* slave3:/usr/local/hadoop/etc/hadoop
rsync -av ~/.bash_profile slave1:~/.bash_profile
rsync -av ~/.bash_profile slave2:~/.bash_profile
rsync -av ~/.bash_profile slave3:~/.bash_profile
Copy the code

Execute from the node:

source ~/.bash_profile
Copy the code

0.9. Format NameNode

You just need to do it at master.

hdfs namenode -format
Copy the code

0.10. Start the cluster

# One-time boot
start-all.sh

Start component by component
start-dfs.sh
start-yarn.sh
Copy the code

0.11. View processes and ports

Check the process

[hadoop@master hadoop]$ jps
13794 NodeManager
13667 ResourceManager
14100 Jps
13143 NameNode
Copy the code
[hadoop@master hadoop]$ ps -ef|grep java hadoop 13143 1 7 02:10 ? 00:00:03 / usr/lib/JVM/Java -- 1.8.0 comes with its - 1.8.0.201. B09-2. El6_10. X86_64 / bin/Java - Dproc_namenode - Djava. If the path = / lib  -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/usr/local/hadoop/logs -Dyarn.log.file=hadoop-hadoop-namenode-master.log -Dyarn.home.dir=/usr/local/hadoop -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/usr/local/hadoop/logs -Dhadoop.log.file=hadoop-hadoop-namenode-master.log -Dhadoop.home.dir=/usr/local/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.NameNode hadoop 13667 1 18 02:10 pts/0 00:00:05 The/usr/lib/JVM/Java -- 1.8.0 comes with its - 1.8.0.201. B09-2. El6_10. X86_64 / bin/Java - Dproc_resourcemanager - Djava. If the path = / usr /local/hadoop/lib -Dservice.libdir=/usr/local/hadoop/share/hadoop/yarn,/usr/local/hadoop/share/hadoop/yarn/lib,/usr/local/hadoop/share/hadoop/hdfs,/usr/local/hadoop/share/hadoop/hdfs/lib,/usr/local/hadoop/share/hadoop/common,/usr/local/hadoop/share/hadoop/common/lib -Dyarn.log.dir=/usr/local/hadoop/logs -Dyarn.log.file=hadoop-hadoop-resourcemanager-master.log -Dyarn.home.dir=/usr/local/hadoop -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/usr/local/hadoop/logs -Dhadoop.log.file=hadoop-hadoop-resourcemanager-master.log -Dhadoop.home.dir=/usr/local/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.resourcemanager.ResourceManager hadoop 13794 1 17 02:10? 00:00:04 / usr/lib/JVM/Java -- 1.8.0 comes with its - 1.8.0.201. B09-2. El6_10. X86_64 / bin/Java - Dproc_nodemanager -Djava.library.path=/lib -Dyarn.log.dir=/usr/local/hadoop/logs -Dyarn.log.file=hadoop-hadoop-nodemanager-master.log -Dyarn.home.dir=/usr/local/hadoop -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/usr/local/hadoop/logs -Dhadoop.log.file=hadoop-hadoop-nodemanager-master.log -Dhadoop.home.dir=/usr/local/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager
hadoop    14116  12870  0 02:11 pts/0    00:00:00 grep java
Copy the code

Check the port

netstat -tnlp
Copy the code

0.12. The appendix

0.12.1. /etc/hosts/etc/sysconfig/networkThe difference between

Before sending a domain name resolution request to the DNS server, the Linux system queries the /etc/hosts file and uses the records in the hosts file if there are corresponding records in the file. The /etc/hosts file usually contains this entry

Network file, path: /etc/sysconfig/network, this file is specific to the computer, is a computer name, is an identifier of the computer.

Sysinit the /etc/rc.d/rc.sysinit startup script changes the hostname of the system to the corresponding name according to the mapping between the eth0 IP address and /etc/hosts.

        # In theory there should be no more than one network interface active
        # this early in the boot process -- the one we're booting from.
        # Use the network address to set the hostname of the client. This
        # must be done even if we have local storage.
        ipaddr=
        if [ "$HOSTNAME" = "localhost" -o "$HOSTNAME" = "localhost.localdomain" ]; thenIpaddr = $(IP addr show to 0.0.0.0/0 scope global | awk'/[[:space:]]inet / { print gensub("/.*","","g",$2) }')
                for ip in $ipaddr ; do
                        HOSTNAME=
                        eval $(ipcalc -h $ip 2>/dev/null)
                        [ -n "$HOSTNAME" ] && { hostname ${HOSTNAME} ; break; }
                done
        fi

# code ...

    # Reread in network configuration data.
    if [ -f /etc/sysconfig/network ]; then
        . /etc/sysconfig/network

        # Reset the hostname.
        action $"Resetting hostname ${HOSTNAME}:" hostname ${HOSTNAME}
    fi
Copy the code

Hadoop 3.0 vs. 2.0

The official is introduced: hadoop.apache.org/docs/r3.0.0…

1 Port Change

Namenode ports: 50470 -- > 9871, 50070 -- > 9870, 8020 -- > 9820 Secondary NN ports: Datanode ports: 50020 - > 9867, 50010 - > 9866, 50475 - > 9865, 50075 - > 9864Copy the code

The configuration file of the 2 nodes is changed from slaves to workers

3 Hadoop3.0 supports Java8 at least

0.13. Problems and solutions

Question:

Although Hadoop is started, Jps does not display Hadoop process information.

Solution:

When HDFS namenode-format is used to format namenode, a current/VERSION file is saved in the namenode data folder to record the clusterID. The clustreID value in the current/VERSION file stored in datanode is the clusterID saved after the first formatting. When formatting again, a new clusterID will be generated and saved in the Current /VERSION file of Namenode. Therefore, the ids of datanode and Namenode are inconsistent. Lead to the above results!

Example Change the clusterID value of the VERSION file of NameNode. The namenode node path: / path/hadoop/TMP/DFS/name/current datanode node path: / path/hadoop/TMP/DFS/data/current/more VERSION, can be found that the datanode clustreID and three other datanode clustreID is different. Vim VERSION Change the clusterID value to the same value as DataNode and start the cluster.

Question:

Couldn’t find datanode to write file. Forbidden

Solution:

If the HDFS fails to be formatted for several times (Reformat Y or N is displayed, but the format fails even after Y is entered), namenode may fail to start. Therefore, you need to format the HDFS again. Clear TMP and logs in the path of hadoop.tmp.dir.

Rm -rf (folder path)/* rm -rf /usr/local/hadoop/data/*
rm -rf /usr/local/data/hadoop/name/*
rm -rf /usr/local/data/hadoop/data/*
Copy the code

After the restart, the problem persists.

Run HDFS dfsadmin-report to check.

[root@master hadoop]# hdfs dfsadmin -report
Configured Capacity: 0 (0 B)
Present Capacity: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used: 0 (0 B)
DFS Used%: 0.00%
Replicated Blocks:
	Under replicated blocks: 0
	Blocks with corrupt replicas: 0
	Missing blocks: 0
	Missing blocks (with replication factor 1): 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0
Erasure Coded Block Groups: 
	Low redundancy block groups: 0
	Block groups with corrupt internal blocks: 0
	Missing block groups: 0
	Low redundancy blocks with highest priority to recover: 0
	Pending deletion blocks: 0
Copy the code

Run the tail -n 10 logs/hadoop-hadoop-datanode-slave1.log command to view slave1 logs.

[root@slave1 hadoop]# tail -n 10 logs/hadoop-hadoop-datanode-slave1.log The 2019-04-18 02:18:15, 895 INFO org. Apache. Hadoop. Ipc. Client: Retrying the connect to server: Localhost. Localdomain / 127.0.0.1:9000. Already tried 4 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, SleepTime = 1000 MILLISECONDS) 2019-04-18 02:18:16, 896 INFO. Org. Apache hadoop. Ipc. Client: Retrying the connect to server: Localhost. Localdomain / 127.0.0.1:9000. Already tried 5 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, SleepTime = 1000 MILLISECONDS) 2019-04-18 02:18:17, 900 INFO. Org. Apache hadoop. Ipc. Client: Retrying the connect to server: Localhost. Localdomain / 127.0.0.1:9000. Already tried 6 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, SleepTime = 1000 MILLISECONDS) 2019-04-18 02:18:18, 904 INFO. Org. Apache hadoop. Ipc. Client: Retrying the connect to server: Localhost. Localdomain / 127.0.0.1:9000. Already tried 7 time (s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, SleepTime = 1000 MILLISECONDS) 2019-04-18 02:18:19, 906 INFO. Org. Apache hadoop. Ipc. Client: Retrying the connect to server: Localhost. Localdomain / 127.0.0.1:9000. Already tried eight time (s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, SleepTime = 1000 MILLISECONDS) 2019-04-18 02:18:20, 788 ERROR. Org. Apache hadoop. HDFS. The server. The datanode. The datanode: 15: a RECEIVED SIGNAL SIGTERM 02:18:20 2019-04-18, 792 INFO org.. Apache hadoop. HDFS. Server. The datanode. The datanode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down the DataNode at slave1/192.168.192.165 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /Copy the code

Localhost. Localdomain is the master configuration file of Slave1. Configuration is HDFS: / / localhost. Localdomain: 9000.

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost.localdomain:9000</value> <description> HDFS internal communication access address </description> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/data/hadoop/tmp</value>
    </property>
</configuration>
Copy the code

The configuration failed to be distributed. You can distribute the configuration again.

Question:

Permission denied: user=dr.who, access=WRITE, inode=”/”:root:supergroup:drwxr-xr-x

Solution:

hadoop fs -chmod 777 /
Copy the code

Question:

Couldn’t upload the file from AI to tensorflow.pdf.

Solution:

The host address resolution is not configured when another machine accesses the host.

Data of 0.14.

Hadoop Learning Road (4) Hadoop cluster building and simple application

Hadoop distributed cluster environment construction

Hadoop – 3.0.0 experience