1 overview

The last article introduced how to build a cluster using virtual machines, but this article is the actual combat, using three real servers to build a Hadoop cluster. In fact, the specific steps are similar to virtual machines, but due to security groups and ports and so on a number of issues, will be different from virtual machines, no more nonsense, the formal start.

2 agreement

MasterThe node’sipwithMasterIPRepresents the host namemastersaid
twoWorkerThe node’sipwithWorker1IP/Worker2IPRepresents the host nameworker1/worker2said
Here to demonstrate the convenience of unified userootThe user logs in, of course not in the production environment

3 (Optional) Local`Host`

Modify local Host to facilitate operations using the Host name:

sudo vim /etc/hosts
# add
MaterIP master
Worker1IP worker1
Worker2IP worker2
Copy the code

4 `ssh`

Copy the public key to three servers after generating the key pair locally:

ssh-keygen -t ed25519 -a 100 # Use the faster and more secure ED25519 algorithm instead of the traditional RSA-3072/4096
ssh-copy-id root@master
ssh-copy-id root@worker1
ssh-copy-id root@worker2
Copy the code

At this point, you can directly use root@host to connect:

ssh root@master
ssh root@worker1
ssh root@worker2
Copy the code

No password is required. If the connection fails or a password is required, check /etc/ssh/sshd_config or system logs.

5 the hostname

Change the host name of the Master node to Master, and the host name of the two Worker nodes to worker1 and worker2:

# Master node
vim /etc/hostname
master
# Worker1 node
# worker1
# Worker2 node
# worker2
Copy the code

Alter Host:

# Master node
vim /etc/hosts
Worker1IP worker1
Worker2IP worker2

# Worker1 node
vim /etc/hosts
MasterIP master
Worker2IP worker2

# Worker1 node
vim /etc/hosts
MasterIP master
Worker1IP worker1
Copy the code

Ping each other after the modification is complete:

ping master
ping worker1
ping worker2
Copy the code

If the ping fails, the fault may be caused by the security group.

6 Configure the basic environment

6.1 `JDK`

/usr/local/java: /usr/local/ Java: /usr/local/ Java: /usr/local/ Java

export PATH=$PATH:/usr/local/java/bin
Copy the code

If the original server has another version of JDK installed, you can uninstall it first:

yum remove java
Copy the code

Note that the following Java needs to be tested after setting the environment variables, as different servers may have different architectures:

For example, the Master node of the author is aARCH64 architecture, while the two workers are x86_64 architecture. Therefore, Java execution error of the Master node is as follows:

# yum install install OpenJDK11

yum install java-11-openjdk
Copy the code

6.2 `Hadoop`

SCP upload Hadoop 3.3.0, decompress it, and save it to /usr/local/hadoop.

After decompressing, modify the following configuration files:

etc/hadoop/hadoop-env.sh
etc/hadoop/core-site.xml
etc/hadoop/hdfs-site.xml
etc/hadoop/workers

6.2.1 `hadoop-env.sh`

Modify the JAVA_HOME environment variable:

export JAVA_HOME=/usr/local/java # Change to your Java directory
Copy the code

6.2.2 `core-site.xml`

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://master:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/hadoop/data/tmp</value>
	</property>
</configuration>
Copy the code

The specific Settings are the same as those in VM mode.

6.2.3 `hdfs-site.xml`

<configuration>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/usr/local/hadoop/data/namenode</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/usr/local/hadoop/data/datanode</value>
	</property>
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
</configuration>
Copy the code

6.2.4 `workers`

worker1
worker2
Copy the code

6.2.5 Copying the Configuration File

# if set port and private key
# add -p port -i private key
scp /usr/local/hadoop/etc/hadoop/* worker1:/usr/local/hadoop/etc/hadoop/
scp /usr/local/hadoop/etc/hadoop/* worker2:/usr/local/hadoop/etc/hadoop/
Copy the code

7 up

7.1 the formatting`HDFS`

In Master, format HDFS first

cd /usr/local/hadoop
bin/hdfs namenode -format
Copy the code

If the configuration file is correct, the formatting is successful.

7.2 `hadoop-env.sh`

Or, in the Master modify/usr/local/hadoop/etc/hadoop/hadoop – env. Sh, add the:

HDFS_DATANODE_USER=root
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
Copy the code

7.3 start

Enable port 9000 and port 9870 on the Master node (common security groups can be enabled. If firewalld/iptables is enabled, add corresponding rules).

sbin/start-dfs.sh
Copy the code

Browser input:

MasterIP:9870
Copy the code

You can see the following page:

If you find that the number of Live Nodes is 0, please check the Worker’s log. It is found that the port is faulty:

If the firewall is disabled and the security group is configured, the Host may be faulty. If the firewall is disabled, the Host may be faulty.

# /etc/hosts
127.0.0.1 master
Copy the code

Delete, delete two workers in the same way:

# /etc/hosts
127.0.0.1 worker1
127.0.0.1 worker2
Copy the code

8 `YARN`

8.1 Environment Variables

Modify/usr/local/hadoop/etc/hadoop/hadoop – env. Sh, add:

export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Copy the code

8.2 `YARN`configuration

Changes in two Worker node/usr/local/hadoop/etc/hadoop/yarn – site. XML:

<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>master</value>
</property>
Copy the code

8.3 open`YARN`

Enable YARN on the Master node:

cd /usr/local/hadoop
sbin/start-yarn.sh
Copy the code

The Master security group also opens ports 8088 and 8031.

8.4 test

Browser input:

MasterIP:8088
Copy the code

You should be able to access the following page:

The cluster is set up.

9 reference

HDFS five: Hadoop rejects remote 9000 port access
How To Set Up a Hadoop 3.2.1 Multi-Node Cluster on Ubuntu 18.04 (2 Nodes)
How to Install and Set Up a 3-Node Hadoop Cluster

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Hadoop Complete Setup Process (iv) : Fully distributed mode (server)

1 overview

2 agreement

3 (Optional) Local`Host`

4 `ssh`

5 the hostname

6 Configure the basic environment

6.1 `JDK`

6.2 `Hadoop`

6.2.1 `hadoop-env.sh`

6.2.2 `core-site.xml`

6.2.3 `hdfs-site.xml`

6.2.4 `workers`

6.2.5 Copying the Configuration File

7 up

7.1 the formatting`HDFS`

7.2 `hadoop-env.sh`

7.3 start

8 `YARN`

8.1 Environment Variables

8.2 `YARN`configuration

8.3 open`YARN`

8.4 test

9 reference

Hadoop Complete Setup Process (iv) : Fully distributed mode (server)

1 overview

2 agreement

3 (Optional) LocalHost

4 ssh

5 the hostname

6 Configure the basic environment

6.1 JDK

6.2 Hadoop

6.2.1 hadoop-env.sh

6.2.2 core-site.xml

6.2.3 hdfs-site.xml

6.2.4 workers

6.2.5 Copying the Configuration File

7 up

7.1 the formattingHDFS

7.2 hadoop-env.sh

7.3 start

8 YARN

8.1 Environment Variables

8.2 YARNconfiguration

8.3 openYARN

8.4 test

9 reference

Related Posts

Kafka Producer synchronous and asynchronous message sending and transaction idempotence case application

Out of the box research and development digital management system, from 20+ implementation cases

TensorFlow Tutorial #09 – Video Data

3 (Optional) Local`Host`

4 `ssh`

6.1 `JDK`

6.2 `Hadoop`

6.2.1 `hadoop-env.sh`

6.2.2 `core-site.xml`

6.2.3 `hdfs-site.xml`

6.2.4 `workers`

7.1 the formatting`HDFS`

7.2 `hadoop-env.sh`

8 `YARN`

8.2 `YARN`configuration

8.3 open`YARN`