1 Installation Environment

Kernel: 3.10.0-229. El7. X86_64

Operating system: CentOS 7

SSH and SSHD are configured by default

There are three machines in the LAN for installation: 192.168.1.170, 192.168.1.171, and 192.168.1.172, where 192.168.1.170 is namenode, 192.168.1.171 and 192.168.1.172 are datanode1 and Datanode2.

The IP addresses have been configured and the three machines can communicate with each other.

Configure the Java 2

This is where Java is configured into the system environment.

First, decompress the official Java zip package and configure the environment variables.

Table 2-1: Configuring Java (root user)

Tar -xzf jdk-8u45-linux-x64.gz mv./jdk1.8.0_45 /usr/local/java cat <<HERE >>/etc/profile JAVA_HOME=/usr/local/java PATH=\$PATH:\$JAVA_HOME/bin HERECopy the code

You need to perform this configuration on each of the three machines.

If you want the JAVA_HOME environment variable to take effect immediately in the current shell environment, run the following command.

Table 2-2: Application environment variables (user root)

source /etc/profile

3 Add a Hadoop user

Add Hadoop users on each of the three machines. After the configuration is complete, the hadoop users can use Hadoop.

Table 3-1: Adding a Hadoop user (root user)

useradd hadoop
passwd hadoop
Copy the code

4 Configure the host name

4.1 Configuring the /etc/hosts file

Add the following to the end of the /etc/hosts file on the three machines:

Table 4-1: Modifying the /etc/hosts file as user root

192.168.1.170   hmaster

192.168.1.171   hslave1

192.168.1.172   hslave2
Copy the code

4.2 Configuring the /etc/hostname file

Change the /etc/hostname file to namenode, datanode1, datanode2 in order:

Table 4-2: Modifying the /etc/hostname file as user root

hmaster
hslave1
hslave2

The modification to the /etc/hostname file takes effect even after the node restarts. If you run the following command, the command takes effect only in the current environment.

Run the following commands on each node in the same order (namenode, datanode1, and datanode2).

Table 4-3: Setting the host name (root user)

hostname hmaster
hostname hslave1
hostname hslave2

5 Configure SSH no-password login

First, you need to modify the configuration file of SSHD on all three machines to allow authenticated login using a key.

Table 5-1: Login using SSH Key Authentication (root User)

cp /etc/ssh/sshd_config /etc/ssh/sshd_config.backup

sed -i '1,/#RSA/s/#RSA/RSA/' /etc/ssh/sshd_config

sed -i '1,/#Pub/s/#Pub/Pub/' /etc/ssh/sshd_config

service sshd restart
Copy the code

You need to generate the private and public keys on each of the three machines. The public keys of namenode are distributed to Namenode, Datanode1, and Datanode2, and the public keys of Datanode1 and Datanode2 are distributed to Namenode. Datanode1 and Datanode2 distribute public keys to each other.

Table 5-2: Namenode Public Key distribution (Hadoop Users)

mkdir $HOME/namenode echo "" | ssh-keygen -t rsa -P '' ssh-copy-id hadoop@hmaster ssh-copy-id hadoop@hslave1 ssh-copy-id  hadoop@hslave2Copy the code

Table 5-3: Datanode1 Public key distribution (Hadoop users)

mkdir $HOME/datanode
echo "" | ssh-keygen -t rsa -P ''
ssh-copy-id hadoop@hmaster
ssh-copy-id hadoop@hslave2
Copy the code

Table 5-4: Datanode2 Public key distribution (Hadoop users)

mkdir $HOME/datanode
echo "" | ssh-keygen -t rsa -P ''
ssh-copy-id hadoop@hmaster
ssh-copy-id hadoop@hslave1
Copy the code

After the configuration is complete, Namenode Hadoop users can log in to Datanode1 and Datanode2 without password. At the same time, datanode1 and Datanode2 can be used to log in to Namenode using SSH without password.

6 configure Hadoop

6.1 Configuring Environment Variables

Configure Hadoop on the three machines. First, decompress the official Hadoop compression package and configure environment variables for Hadoop users.

Table 6-1: Configuring Hadoop as user root

Gz mv hadoop-2.6.0 /usr/local/hadoop chown -r hadoop:hadoop /usr/local/hadoopCopy the code

Table 6-2: Configuring Hadoop Environment Variables (Hadoop users)

cat  <<HERE >>/home/hadoop/.bashrc

HADOOP_HOME=/usr/local/hadoop
HADOOP_PREFIX=\$HADOOP_HOME
HADOOP_COMMON_HOME=\$HADOOP_HOME
HADOOP_CONF_DIR=\$HADOOP_HOME/etc/hadoop
HADOOP_HDFS_HOME=\$HADOOP_HOME
HADOOP_MAPRED_HOME=\$HADOOP_HOME
HADOOP_YARN_HOME=\$HADOOP_HOME
PATH=\$PATH:\$HADOOP_HOME/sbin:\$HADOOP_HOME/bin

HERE
Copy the code

If you want the environment variables related to Hadoop to take effect immediately in the current shell environment, run the following command.

Table 6-3: Application environment variables (Hadoop users)

source ~/.bashrc

Modify the hadoop – 6.2 env. Sh

Modify the default configuration file of Hadoop. The default configuration file is stored in the $HADOOP_HOME/etc/hadoop directory.

Hadoop-env.sh (three machines)

Back up the file first:

Table 6-4: Backing up hadoop-env.sh (Hadoop user)

cp hadoop-env.sh hadoop-env.sh.backup

Modify the file as follows:

Table 6-5: Modified hadoop-env.sh

export JAVA_HOME=/usr/local/java
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} 
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do  
   if [ "$HADOOP_CLASSPATH" ]; then    
        export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f  
   else    
export HADOOP_CLASSPATH=$f  
fi
done 
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS"
export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
export HADOOP_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}
export HADOOP_IDENT_STRING=$USER
Copy the code

6.3 modify the HDFS – site. XML

Hdfs-site.xml (three machines)

Back up the file first:

Table 6-6: Backing up HDFS-site. XML (Hadoop user)

cp hdfs-site.xml hdfs-site.xml.backup

Modify the file as follows:

Table 6-7: Modified HDFS-site.xml file

<configuration>
<property>        
    <name>dfs.namenode.name.dir</name>        
    <value>/home/hadoop/namenode</value>
</property>
<property>       
    <name>dfs.permissions.superusergroup</name>       
    <value>hadoop</value>
</property>
<property>       
    <name>dfs.permissions</name>       
    <value>false</value>
</property><
property>       
    <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
    <value>never</value>
</property>
<property>       
    <name>dfs.replication</name>       
    <value>2</value></property><property>       
    <name>dfs.support.append</name>       
    <value>true</value></property><property>        
    <name>dfs.namenode.secondary.http-address</name>        
    <value>hmaster:50090</value>
</property>
<property>       
    <name>dfs.namenode.http-address</name>       
    <value>hmaster:50070</value>
</property>
<property>       
    <name>dfs.datanode.data.dir</name>       
    <value>file:///home/hadoop/datanode</value>       
    <final>true</final>
</property>
</configuration>
Copy the code

6.4 modify the mapred – site. XML

Mapred-site.xml (three machines)

First generate the file:

Table 6-8: Backing up mapred-site. XML (Hadoop user)

cp mapred-site.xml.template mapred-site.xml

Modify the file as follows:

Table 6-9: Mapred-site.xml after modification

<configuration>
<property>  
    <name>mapreduce.framework.name</name>   
    <value>yarn</value>
</property>
</configuration>
Copy the code

6.5 modify the slaves

All three machines

Back up the file first:

Table 6-10: Backup Slaves (Hadoop user)

cp slaves slaves.backup

Modify the file as follows:

Table 6-11: Modified Slaves

hslave1 hslave2

6.6 modify the yarn – site. XML

Yarn-site.xml (three machines)

Back up the file first:

Table 6-12: Backing up Yarn-site. XML (Hadoop user)

cp yarn-site.xml yarn-site.xml.backup

Modify the file as follows:

Table 6-13: New yarn-site. XML file

<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value>  </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hmaster:8031</value></property> <property> <name>yarn.resourcemanager.address</name> <value>hmaster:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hmaster:8030</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hmaster:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hmaster:8088</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_ HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YA RN_HOME/share/hadoop/yarn/lib/</value> </property> </configuration>Copy the code

6.7 modify the core – site. XML

Core-site.xml (three machines)

Back up the file first:

Table 6-14: Backing up core-site. XML (Hadoop user)

cp core-site.xml core-site.xml.backup

Modify the file as follows:

Table 6-15: Modified core-site.xml

<configuration>
<property>    
    <name>fs.defaultFS</name>    
    <value>hdfs://hmaster:9000/</value>
</property>
<property>       
    <name>hadoop.tmp.dir</name>       
    <value>/home/hadoop/data/hadoop/tmp</value>
</property>
</configuration>
Copy the code

6.8 Disabling the Firewall and Disabling IPv6

Disable the firewall:

Table 6-16: Disabling the firewall (root user)

Systemctl stop firewalld # Run the following command to configure the firewall to be disabled upon startup: systemctl disable firewalldCopy the code

Disable IPv6:

Table 6-17: Backing up the /etc/sysctl.conf file as user root

cp /etc/sysctl.conf /etc/sysctl.conf.backup

Table 6-18: Add the following content to /etc/sysctl.conf as user root

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
Copy the code

To make the changes in the /etc/sysctl.conf file take effect immediately, run the following command.

Table 6-19: Application modification (root user)

sysctl -p

Disable SELINUX and modify the /etc/selinux/config file to the following content:

Table 6-20: Modifying the /etc/selinux/config file as user root

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
#SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of three two values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.SELINUXTYPE=targeted
Copy the code

After modifying this configuration file, restarting the machine will disable SELINUX.

To disable SELINUX immediately, run the following command.

Table 6-21: Disabling SELINUX as user root

setenforce 0

69 Formatting the HDFS

Run the following command on namenode to format HDFS:

Table 6-22: Formatting the HDFS (Hadoop user)

hdfs namenode -format

7 test

Run the following command on namenode to start Hadoop:

Table 7-1: Starting Hadoop (Hadoop user)

start-dfs.shstart-yarn.sh

Visit http://hmaster:50070 to check whether the installation is successful. Hmaster can be changed to the actual NAMenode IP address.

If you use Windows, you can modify the hosts file to match the host name and IP address. Modify the C:\Windows\System32\drivers\etc\hosts file and add the following content to the file. After saving the file, you can enter the host name instead of the IP address in the address bar of the browser.

192.168.1.170   hmaster

192.168.1.171   hslave1

192.168.1.172   hslave2
Copy the code