Redhat 6.5 Hadoop2.5.2 Installation

The node is set up on three hosts. The primary node is master, and the child nodes are node1 and node2

1. Change the host name

Run the following command as user root, which needs to be modified for each host and each host should be different:

vi /etc/sysconfig/network

reboot

After the restart, if the value in the red box in the following figure changes to the value above, it indicates that the change is successful

2. Configure the mapping between the host name and IP address

Configure the following for each host:

vi /etc/hosts

3. Create hadoop user groups and accounts

Run the following command for each host as user root:

Create a Hadoop user group

groupadd hadoop

Creating a Hadoop User

useradd -s /bin/bash -d /home/hadoop -m hadoop -g hadoop

To set a PASSWORD for a Hadoop user, enter the same PASSWORD twice. The BAD PASSWORD in the following figure can be ignored

passwd hadoop

4. Configure SSH encryption exemption

In the Hadoop user home directory, run the following command to generate the SSH public key and key. The SSH file is generated in the current directory:

Ssh-keygen -t dsa (press Enter until the following figure is displayed), or you can run the ssh-keygen -t dsa -p “-f ~/. SSH /id_dsa command to generate dsa

Go to the. SSH directory and run cat to generate the authorized_keys file

cd .ssh

cat id_dsa.pub >> authorized_keys

Change the permission of the authorized_keys file. This operation must be performed; otherwise, SSH encryption exemption fails

chmod 755 authorized_keys

Then check whether the SSH connection to the localhost is encrypted. When you run the SSH localhost command for the first time, you need to enter yes

SSH localhost: yum -y install openssh-clients

Example Exit the SSH connection command

exit

5. The master node uses SSH to log in to the slave node.

Log in to the child node host as user Hadoop and run the SCP command in the. SSH directory of the child node to copy the public key file to the child node

scp hadoop@master:~/.ssh/id_dsa.pub ./master_dsa.pub

Append the primary node’s public key to the child node’s authorized_keys file

cat master_dsa.pub >> authorized_keys

Run the SSH command on the host of the primary node to test whether the secret exemption is successful. The same as connecting to the local host, you need to enter yes for the first time.

Not surprisingly, the primary node can now connect to its children

ssh node2

Example Exit the SSH connection command

exit

In the same way, set the other child nodes to be encrypted

6. Install JDK (as root)

Download the JDK package (for example, JDK-7U79-linux-x64.tar. gz) and upload the package to the /root directory on the server

Create the Java installation directory

mkdir -p /opt/app/

Go to the root home directory and decompress the JDK package to the installation directory

cd /root

tar -zxvf jdk-7u79-linux-x64.tar.gz -C /opt/app/

To configure Java environment variables and open /etc/profile, add the following:

vi /etc/profile

Execute the source command to make the environment variable take effect

source /etc/profile

Run the Java -version check. The following information indicates that the environment variables have taken effect and the installation is complete

7. Modify the Hadoop configuration file

Download the hadoop package (version 2.5.2) and upload it to master using SFTP:

Decompress the Hadoop package to the Hadoop user’s home directory

The tar XVF – hadoop – 2.5.2. Tar. Gz

Go to the Hadoop configuration directory

CD hadoop – 2.5.2 / etc/hadoop

Configure the core-site.xml file

vi core-site.xml

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/hadoop/tmp</value>

<description> Abase for other temporary directories.</description>

</property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

</property>

<name>io.file.buffer.size</name>

</property>

Configure the HDFS -site. XML file

vi hdfs-site.xml

<name>dfs.nameservices</name>

<value>hadoop-cluster1</value>

</property>

<name>dfs.namenode.secondary.http-address</name>

<value>master:50090</value>

</property>

<name>dfs.namenode.name.dir</name>

<value>file:///home/hadoop/hadoop/dfs/name</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:///home/hadoop/hadoop/dfs/data</value>

</property>

<name>dfs.replication</name>

</property>

<name>dfs.webhdfs.enabled</name>

</property>

Configure the mapred-site. XML file. In version 2.5.2, only the mapred-site.xml.template file exists

cp mapred-site.xml.template mapred-site.xml

vi mapred-site.xml

<name>mapreduce.framework.name</name>

</property>

<name>mapreduce.jobtracker.http.address</name>

<value>master:50030</value>

</property>

<name>mapreduce.jobhistory.address</name>

<value>master:10020</value>

</property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>master:19888</value>

</property>

Configuration of yarn – site. XML

vi yarn-site.xml

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.resourcemanager.address</name>

<value>master:8032</value>

</property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>master:8030</value>

</property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>master:8031</value>

</property>

<name>yarn.resourcemanager.admin.address</name>

<value>master:8033</value>

</property>

<name>yarn.resourcemanager.webapp.address</name>

<value>master:8088</value>

</property>

Configure Slaves File

Configure the hadoop-env.sh file

Set javA_HOME to the JDK installation directory

vi hadoop-env.sh

8. Copy the Hadoop package to each child node and start Hadoop

In the hadoop user home directory on the primary node, run the following command to copy the Hadoop package to each child node:

SCP – r. / hadoop – 2.5.2 node1: ~

SCP – r. / hadoop – 2.5.2 2: ~

Formatting the file system

CD hadoop – 2.5.2

bin/hdfs namenode -format

Disable the firewall on the master node and child node. Note: Restarting the host in this way will enable the firewall again (if not, you can enable all ports required by Hadoop, but the operation is relatively troublesome).

service iptables stop

Start the hadoop

cd sbin

./start-all.sh

9. Check the Hadoop process

Run the JPS command on each host and if the following processes exist, the system starts normally

Access the HDFS status page and YARN resource management page

Upload the file to HDFS and check whether the file can be uploaded successfully

10. Configure hadoop environment variables

Add the hadoop bin directory to the PATH variable of the /etc/profile file so that hadoop commands can be executed anywhere.

You don’t have to go to the hadoop installation directory with bi/… Form execution of

11. Start the jobhistory

Hadoop JobHistory records MapReduce jobs that have been run and stores them in the specified HDFS directory. By default, MapReduce jobs are not started. You need to manually start the service after the configuration.

Add the following configuration to mapred-site. XML:

<name>mapreduce.jobhistory.address</name>

<value>hadoop000:10020</value>

<description>MapReduce JobHistory Server IPC host:port</description>

</property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>hadoop000:19888</value>

<description>MapReduce JobHistory Server Web UI host:port</description>

</property>

<name>mapreduce.jobhistory.done-dir</name>

<value>/history/done</value>

</property>

<name>mapreduce.jobhistory.intermediate-done-dir</name>

<value>/history/done_intermediate</value>

</property>

Yarn-site. XML Add the following configuration (yarn needs to be restarted) :

<name>yarn.log-aggregation-enable</name>

</property>

Run the start jobhistory command

Mr-jobhistory -daemon.sh start historyServer (stop command: mr-jobhistory-daemon.sh stop historyServer)

After the history-server is started, you can access the web user interface (WEBUI) : hadoop000:19888

Two directories are generated in HDFS

hadoop fs -ls /history

drwxrwx— – spark supergroup 0 2014-10-11 15:11 /history/done

drwxrwxrwt – spark supergroup 0 2014-10-11 15:16 /history/done_intermediate

Tracking UI cannot be accessed

Tracking UI cannot be accessed because the host accessing port 8088 cannot resolve the hostname of the cluster. Add the following configuration to the hosts file to solve the problem:

Related Posts

K8s Service Access control

Apache Superset uses sharing

The 23 Most Useful ES Retrieval Tips (Java API Implementation)