Note 1 — Server-based Hadoop Construction 2021

Internship project, learn to build hadoop integration environment

[Build hadoop list]

  • [hadoop – 3.2.1]
  • [the JDK – 1.8.0 comes with]

Configuration Flowchart

Hadoop Setup – Initial preparation

First, provide an environment

10.0.20.181, 10.0.20.182, 10.0.20.183 // User: root, password: hadoop

Two, check the environment

1) Whether the server can be logged in

Three devices work properly. Log in to SSH user@ip and enter passwd

SSH [email protected] passwd Enter: hadoop (verify three devices 181,182,183)

1.2) Whether you have permission to perform write operations
  • Have permission
1.3) Whether the hard disk has space for operation

df -lh

3. Establish mappings between hosts and IP addresses

Modify the following: IP address Host name: enter vim /etc/hosts

Cat: see

Attention! After this change, the host name has not changed

vi /etc/sysconfig/network

(About Linux I: Enter save and exit: wq!)

This step is especially important because it may be missing if a bug occurs.

After restart, host name change server name change complete! Do the same for the rest of the server configuration. Change the corresponding name.

After the configuration is complete, ping each other. Ping name -c + times

Install the JDK — Java environment 1.8.0

You are advised to use yum to install the JDK

Yum -y install Java -- 1.8.0 comes with its *

The default installation path is /usr/lib/jvm,

Configure environment variables and modify the vim /etc/profile configuration file

export JAVA_HOME=/usr/lib/jvm/jre-1.8. 0-openjdk-1.8. 0151.-1.b12.el7_4.x86_64
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
Copy the code

Attention! Java version name

Use the souce command for immediate effect, source /etc/profile

Verify the Java version

Java installation successful!

Five, no secret login

1, first turn off the firewall and SELINUX for the three servers, copy and paste step by step

View the firewall status service iptables status

Disabling firewall service iptables stop, chkconfig iptables off

After shutting down SELINUX, you need to restart the server

Enter the vim/etc/selinux/config

#SELINUX=enforcing changed to disabled #SELINUXTYPE=targetedCopy the code

2. Password-free login

2.1) Configure the master to log in without a password. Copy and paste step by step

Example Generate the key ssh-keygen -t rsa

Append the public key to the authorized_keys file

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Grant permission chmod 600. SSH /authorized_keys

Verify that SSH Master is accessed without a password

2.2) No secret access between several machines

Log in to Slave1 and copy the master server’s public key ID_rsa. pub to the root directory of the Slave1 server.

scp root@master:/root/.ssh/id_rsa.pub /root/

Append master’s public key (id_rsa.pub) to Slave1’s authorized_keys

Pub >>. SSH /authorized_keys, rm -rf id_rsa.pub Delete the previous file

SSH Slave1 to log in to Slave1 without a password. In the same way, the last three computers can log in to each other without a password.

Note that SCP is a very useful command for remote copy across machines. In addition to the method I wrote, there are direct three machines each other a cross-machine copy can be used, interested in online search, this method used a lot, especially in the case of multiple machines working together.

At this point, our basic preparation work is over, and the next step is to build the Hadoop environment.

Hadoop Setup – Version 3.2.1

1. Decompress the installation package on master and create a base directory

Method 1. Download directly from the website, but if you use VPN in China, and then download from the server, it will be very slow, very slow, I started with the company’s network in 6 hours.

Wget HTTP: / / http://apache.claz.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

Method 2. Use domestic mirror image, no research, a search must be many. Do your own research.

Method 3. I use this method, using Xshell and Xftp, upload the Hadoop installation package downloaded in Win system to the server, very fast, simple and easy to use! This super simple, search for xshell, XFTP use method, IP, and then select the file, OK.

Gz -c /usr/local to the /usr/local directory: tar -xzvf hadoop-3.2.1

Change name (mv original name new name) mv hadoop-3.2.1 Hadoop

2. Configure hadoop environment variables for hadoop-master

2.1) Configure environment variables and modify configuration filesvi /etc/profile

#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME 
export HADOOP_INSTALL=$HADOOP_HOME 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_CONF_DIR=$HADOOP_HOME 
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec 
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
#export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HADOOP_CONF_DIR=$HADOOP_PREFIX/usr/local/hadoop/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Copy the code

Important! Source /etc/profile command that enables hadoop commands to take effect immediately on the current terminal

Check whether Hadoop Version is installed successfully

2.2) Configure environment variables slave1 and slave2 for other hosts

Copy environment variables to Slave1, slave2

SCP -r /etc/profile root@slave1: /etc/profile,

scp -r /etc/profile root@slave2:/etc/

Enable environment variables to take effect in Slave1, slave2 source /etc/profile

Hadoop configuration file

Hadoop-env.sh (configure the home path of our JDK),

Core-site.xml (the core configuration file that basically defines whether our cluster is distributed or run natively),

Hdfs-site.xml (the core configuration of the distributed file system determines where our data is stored, the copy of the data, the block size of the data, etc.),

Mapred-site.xml (which defines some of our parameters for mapReduce running),

Yarn-site.xml (the core configuration file that defines our YARN cluster, the resource management framework),

Good! Start configuration!

Hadoop configuration file –master

We this several files are configured in the/usr/local/hadoop/etc/hadoop directory. All configuration files are for reference only.

CD into the directory/usr/local/hadoop/etc/hadoop

vim hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/jre-1.8. 0-openjdk-1.8. 0151.-1.b12.el7_4.x86_64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
Copy the code

vim core-site.xml

Where fs.defaultFS configuration address is Java code access path, need to configure the code in Java code with IP:9000 cannot use localhost,

<configuration> <! -- specify HDFS access address --><property>
     <name>fs.default.name</name>
     <value>hdfs://master:9000</value>
 </property><! -- specify zero hour file address --><property>
     <name>hadoop.tmp.dir</name>
     <value>/usr/local/hadoop/tmp</value>
 </property>
 </configuration>
Copy the code

vim hdfs-site.xml

DFS. The namenode. Secondary. The HTTP address – the address is the browser to access the address file system

The primary Namenode has an HDFS access address: http://10.0.20.181:50070

<configuration>
 <property>

<! Value: 2-->
     <name>dfs.replication</name>
     <value>2</value>
 </property>
 <property>

<! -- Select namenode storage path -->
     <name>dfs.namenode.name.dir</name>
     <value>/usr/local/hadoop/hdfs/name</value>
 </property>
 <property>
     <name>dfs.namenode.data.dir</name>
     <value>/usr/local/hadoop/hdfs/data</value>
 </property><! -- Specify a port for web access to HDFS --><property>
  <name>dfs.http.address</name>
  <value>master:50070</value>
</property><! -- Disable permission check user or user group --><property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
 </configuration>
Copy the code

vim mapred-site.xml

Mapred.job. tracker determines how the MapReduce program is executed

<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
   <property>
      <name>mapred.job.tracker</name>
      <value>http://master:9001</value>
  </property>
</configuration>
Copy the code

vim yarn-site.xml

Configure resources. You can configure many resources on YARN.

<configuration>
 <property><! -- RM host name -->
     <name>yarn.resourcemanager.hostname</name>
     <value>master</value>
 </property>
 <property>

<! -- Comma-separated list of services, where the service name should contain only A-zA-Z0-9_ and cannot start with a number -->
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
 </property>
 </configuration>
Copy the code

2, configure masters slaves file (master unique!)

Into the hadoop directory CD/usr/local/hadoop/etc/hadoop file specifies the namenode node server machine. Delete localhost and add namenode host name master. You are not advised to use IP addresses because IP addresses may change, but host names generally do not.

Vim masters to add

master
Copy the code

Vim slaves to add

slave1
slave2
Copy the code

Vim workers add

master
slave1
slave2
Copy the code

3, Hadoopo configuration — Slaves

Copy Hadoop from Master to Slave1 node

scp -r /usr/local/hadoop slave1:/usr/local/

Log in to slave1 server and delete the content of Slaves

rm -rf /usr/local/hadoop/etc/hadoop/slaves

Start the hadoop

1. Format the HDFS file system hadoop namenode -formate Formats namenode. This operation is performed before starting the service for the first time and does not need to be performed later.

Attention! Use formatting carefully! For the first time, do not use it unless otherwise unexpected!

2. Start hadoop start-all.sh

3. Run the JPS command to check the running status

Check slave1 and Slave2 in operation JPS

Web and process view

Web view: http://master:50070 View datanode

10.0.20.181:8088 Stores namenode nodes

! Recommendation site

Three cloud servers set up a fully distributed Hadoop environment

Blog.csdn.net/weixin_4393…

The hadoop distributed cluster structures, www.ityouknow.com/hadoop/2017…

Problem solved: Formatting or JAVA_HOME is faulty

Solution connection bbs.huaweicloud.com/blogs/24226…

Test it in a month

There are basically no problems with this configuration version. The problem I tested was formatting. Delete TMP file from Hadoop, delete rm -rf folder, create logs mkdir logs, restart start-all.sh problem resolved 50070 and namenode appear.