The node is set up on three hosts. The primary node is master, and the child nodes are node1 and node2
1. Change the host name
Run the following command as user root, which needs to be modified for each host and each host should be different:
vi /etc/sysconfig/network



reboot
After the restart, if the value in the red box in the following figure changes to the value above, it indicates that the change is successful

2. Configure the mapping between the host name and IP address
Configure the following for each host:
vi /etc/hosts



3. Create hadoop user groups and accounts
Run the following command for each host as user root:


Create a Hadoop user group
groupadd hadoop


Creating a Hadoop User
useradd -s /bin/bash -d /home/hadoop -m hadoop -g hadoop


To set a PASSWORD for a Hadoop user, enter the same PASSWORD twice. The BAD PASSWORD in the following figure can be ignored
passwd hadoop

4. Configure SSH encryption exemption
In the Hadoop user home directory, run the following command to generate the SSH public key and key. The SSH file is generated in the current directory:
Ssh-keygen -t dsa (press Enter until the following figure is displayed), or you can run the ssh-keygen -t dsa -p “-f ~/. SSH /id_dsa command to generate dsa



Go to the. SSH directory and run cat to generate the authorized_keys file
cd .ssh


cat id_dsa.pub >> authorized_keys


Change the permission of the authorized_keys file. This operation must be performed; otherwise, SSH encryption exemption fails
chmod 755 authorized_keys


Then check whether the SSH connection to the localhost is encrypted. When you run the SSH localhost command for the first time, you need to enter yes
SSH localhost: yum -y install openssh-clients


Example Exit the SSH connection command
exit


5. The master node uses SSH to log in to the slave node.
Log in to the child node host as user Hadoop and run the SCP command in the. SSH directory of the child node to copy the public key file to the child node
scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub


Append the primary node’s public key to the child node’s authorized_keys file
cat master_dsa.pub >> authorized_keys


Run the SSH command on the host of the primary node to test whether the secret exemption is successful. The same as connecting to the local host, you need to enter yes for the first time.
Not surprisingly, the primary node can now connect to its children
ssh node2


Example Exit the SSH connection command
exit

In the same way, set the other child nodes to be encrypted


6. Install JDK (as root)
Download the JDK package (for example, JDK-7U79-linux-x64.tar. gz) and upload the package to the /root directory on the server


Create the Java installation directory
mkdir -p /opt/app/


Go to the root home directory and decompress the JDK package to the installation directory
cd /root


tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/app/


To configure Java environment variables and open /etc/profile, add the following:
vi /etc/profile

Execute the source command to make the environment variable take effect
source /etc/profile


Run the Java -version check. The following information indicates that the environment variables have taken effect and the installation is complete



7. Modify the Hadoop configuration file
Download the hadoop package (version 2.5.2) and upload it to master using SFTP:



Decompress the Hadoop package to the Hadoop user’s home directory
The tar XVF – hadoop – 2.5.2. Tar. Gz



Go to the Hadoop configuration directory
CD hadoop – 2.5.2 / etc/hadoop


Configure the core-site.xml file
vi core-site.xml

<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop/tmp</value>
<description> Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>


Configure the HDFS -site. XML file
vi hdfs-site.xml

<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
Configure the mapred-site. XML file. In version 2.5.2, only the mapred-site.xml.template file exists
cp mapred-site.xml.template mapred-site.xml


vi mapred-site.xml

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>master:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>


Configuration of yarn – site. XML
vi yarn-site.xml

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>


Configure Slaves File



Configure the hadoop-env.sh file


Set javA_HOME to the JDK installation directory
vi hadoop-env.sh



8. Copy the Hadoop package to each child node and start Hadoop
In the hadoop user home directory on the primary node, run the following command to copy the Hadoop package to each child node:
SCP – r. / hadoop – 2.5.2 node1: ~
SCP – r. / hadoop – 2.5.2 2: ~


Formatting the file system
CD hadoop – 2.5.2


bin/hdfs namenode -format


Disable the firewall on the master node and child node. Note: Restarting the host in this way will enable the firewall again (if not, you can enable all ports required by Hadoop, but the operation is relatively troublesome).
service iptables stop


Start the hadoop
cd sbin


./start-all.sh



9. Check the Hadoop process
Run the JPS command on each host and if the following processes exist, the system starts normally









Access the HDFS status page and YARN resource management page




Upload the file to HDFS and check whether the file can be uploaded successfully



10. Configure hadoop environment variables
Add the hadoop bin directory to the PATH variable of the /etc/profile file so that hadoop commands can be executed anywhere.
You don’t have to go to the hadoop installation directory with bi/… Form execution of



11. Start the jobhistory
Hadoop JobHistory records MapReduce jobs that have been run and stores them in the specified HDFS directory. By default, MapReduce jobs are not started. You need to manually start the service after the configuration.
Add the following configuration to mapred-site. XML:


<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop000:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>


<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop000:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>


<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/history/done</value>
</property>


<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/history/done_intermediate</value>
</property>


Yarn-site. XML Add the following configuration (yarn needs to be restarted) :
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>


Run the start jobhistory command
Mr-jobhistory -daemon.sh start historyServer (stop command: mr-jobhistory-daemon.sh stop historyServer)


After the history-server is started, you can access the web user interface (WEBUI) : hadoop000:19888
Two directories are generated in HDFS
hadoop fs -ls /history


drwxrwx— – spark supergroup 0 2014-10-11 15:11 /history/done
drwxrwxrwt – spark supergroup 0 2014-10-11 15:16 /history/done_intermediate


Tracking UI cannot be accessed

Tracking UI cannot be accessed because the host accessing port 8088 cannot resolve the hostname of the cluster. Add the following configuration to the hosts file to solve the problem: