Hadoop Tour 2-centerOS7: Build a distributed Hadoop environment

We may search for big data on the Internet, and a lot of theoretical things come to us, all in this way, at that time a face meng forced, so I will not introduce theoretical knowledge to you, directly take you to build a distributed development environment.

Hadoop distributed architecture (one master, two slave)

The host name	The IP address	namenode	datanode
master	192.168.6.133	is	no
slave1	192.168.6.131	no	is
slave2	192.168.6.132	no	is

The first step

The VM and Java environment are available

A centerOS7 VM is required and the JDK environment has been set up. If you have any questions, you can take the Hadoop tour 1-Centeros7: Set up the Java environment

The second step

Software to prepare

Get your Hadoop development kit ready

Apache official website download
Apache history library download
Hadoop-2.7.3 is the version I share with you
I used Filezilla to put the decompression package into the Linux system. You can also use the wget command to download it directly to the Linux system

The third step

Unzip Hadoop and rename it

Decompress the Hadoop package in the downloaded directory

[root@localhost mmcc]# tar - ZXVF hadoop - 2.7.3. Tar. Gz. // Rename the directory name (optional) [root@localhost MMCC]# mv hadoop - 2.7.3 / hadoop2.7.3
Copy the code

View the hadoop root path

[root@localhost mmcc]# CD hadoop2.7.3 /[root @ localhost hadoop2.7.3]# pwd/home/ MMCC /hadoop2.7.3 // Is used to configure environment variablesCopy the code

The fourth step

Configuring environment Variables

in/etc/profileAt the bottom,Hadoop tour 1-centerOS7: build Java environmentSection configurationPATH.CLASSPATHAdd environment variable configuration above

HADOOP_HOME = / home/MMCC/hadoop2.7.3 PATH =$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH:.
Copy the code

Enable environment variables

[root @ localhost jdk1.8]# source /etc/profile
Copy the code

Configure the Java environment of Hadoop and edit the Hadoop root directory/etc/hadoop/Under thehadoop.env.shThe script file

Vi/hadoop - 2.7.3 / etc/hadoop/hadoop env. ShexportJAVA_HOME=/home/ MMCC /jdk1.8 // Configure the Java environment directoryCopy the code

Configure the hadoop startup environment and edit the Hadoop root directory/etc/hadoop/Under thecore-site.xmlFile.

<property> <name>fs.defaultFS</name> <value> HDFS ://master:9000Copy the code

The master here will teach you later

Step 5

Distributed Environment construction

For convenience, clone multiple images using vm cloning. In this way, all environments created before this step are synchronized

Use this command to set a host name for each node

[root@localhost mmcc]# hostnamectl set-hostname master/slave1/slave2
Copy the code

Detect network

[root@localhost mmcc]# ifconfigens33: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1500 inet 192.168.6.133 netmask 255.255.255.0 BROADCAST 192.168.6.255 inet6 fe80::3d1d:5127:6666:c62d prefixlen 64 scopeid 0x20<link> ether 00:0c:29:f4:ef:5d txqueuelen 1000 (Ethernet) RX Packets 317168 bytes 315273916 (300.6 MiB) RX errors 0 Dropped 0 Overruns 0 Frame 0 TX packets 149675 bytes 14400069 (13.7 MiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0 LO: Flags =73<UP,LOOPBACK,RUNNING> MTU 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixLen 128 scopeid 0x10<host> loop Txqueuelen 1 (Local Loopback) RX packets 12826 bytes 3163428 (3.0 MiB) RX errors 0 Dropped 0 Overruns 0 frame 0 TX Packets 12826 bytes 3163428 (3.0 MiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0Copy the code

If the IP address cannot be queried, configure the network

cd/etc/sysconfig/network-scripts/ vi ifcfg-ens33 (my VM version, other versions may vary) ONBOOT="yes"Table starts the network.Copy the code

Set the network alias, that is, the alias corresponding to the IP address. For example:hdfs://master:9000

[root@localhost network-scripts]# vi /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4. Localdomain4 ::1 localhost localhost.localdomain Localhost6 localhost6. Localdomain6 192.168.6.133 Master 192.168.6.131 slave1 192.168.6.132 slave2Copy the code

Restart the network

Service network restart Restarts the networkCopy the code

Then you can try ping master/slave1/slave2. If the ping succeeds, the configuration is successful

Format HDFS and run the following command on each node

hdfs namenode -format
Copy the code

Format before startup. If there is no error or Exception, the format is successful

6. Configure the Hadoop cluster node for the master host

cd/ home/MMCC hadoop2.7.3 / etc/hadoop [root @ localhost hadoop]# vi slaves // Add the following content slave1 slave2Copy the code

7. Disable the firewall on each node and start the HDFS service.

[root@localhost mmcc]# systemctl stop firewalld
[root@localhost mmcc]# hadoop-daemon.sh start namenode
[root@localhost mmcc]Sh start datanode // slave nodes slave1, slave2
Copy the code

Then you can enter the master node address master:50070 or IP address :50070 on the web page to view the current status and node status oh. At this point a distributed Hadoop environment has been successfully started. In the next section, you will learn how to perform secret free login, one-click cluster startup, and some simple HDFS file storage commands. If there is any problem during configuration, you can check the log to troubleshoot the problem. Welcome to add my wechat to learn and make progress together