We may search for big data on the Internet, and a lot of theoretical things come to us, all in this way, at that time a face meng forced, so I will not introduce theoretical knowledge to you, directly take you to build a distributed development environment.

Hadoop distributed architecture (one master, two slave)

The host name The IP address namenode datanode
master 192.168.6.133 is no
slave1 192.168.6.131 no is
slave2 192.168.6.132 no is

The first step

The VM and Java environment are available

A centerOS7 VM is required and the JDK environment has been set up. If you have any questions, you can take the Hadoop tour 1-Centeros7: Set up the Java environment

The second step

Software to prepare

Get your Hadoop development kit ready

  1. Apache official website download
  2. Apache history library download
  3. Hadoop-2.7.3 is the version I share with you
  4. I used Filezilla to put the decompression package into the Linux system. You can also use the wget command to download it directly to the Linux system

The third step

Unzip Hadoop and rename it

  1. Decompress the Hadoop package in the downloaded directory
[root@localhost mmcc]# tar - ZXVF hadoop - 2.7.3. Tar. Gz. // Rename the directory name (optional) [root@localhost MMCC]# mv hadoop - 2.7.3 / hadoop2.7.3
Copy the code
  1. View the hadoop root path
[root@localhost mmcc]# CD hadoop2.7.3 /[root @ localhost hadoop2.7.3]# pwd/home/ MMCC /hadoop2.7.3 // Is used to configure environment variablesCopy the code

The fourth step

Configuring environment Variables

  1. in/etc/profileAt the bottom,Hadoop tour 1-centerOS7: build Java environmentSection configurationPATH.CLASSPATHAdd environment variable configuration above
HADOOP_HOME = / home/MMCC/hadoop2.7.3 PATH =$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH:.
Copy the code
  1. Enable environment variables
[root @ localhost jdk1.8]# source /etc/profile
Copy the code
  1. Configure the Java environment of Hadoop and edit the Hadoop root directory/etc/hadoop/Under thehadoop.env.shThe script file
Vi/hadoop - 2.7.3 / etc/hadoop/hadoop env. ShexportJAVA_HOME=/home/ MMCC /jdk1.8 // Configure the Java environment directoryCopy the code
  1. Configure the hadoop startup environment and edit the Hadoop root directory/etc/hadoop/Under thecore-site.xmlFile.
<property> <name>fs.defaultFS</name> <value> HDFS ://master:9000Copy the code

The master here will teach you later

Step 5

Distributed Environment construction

  1. For convenience, clone multiple images using vm cloning. In this way, all environments created before this step are synchronized

  1. Use this command to set a host name for each node
[root@localhost mmcc]# hostnamectl set-hostname master/slave1/slave2
Copy the code
  1. Detect network
[root@localhost mmcc]# ifconfigens33: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1500 inet 192.168.6.133 netmask 255.255.255.0 BROADCAST 192.168.6.255 inet6 fe80::3d1d:5127:6666:c62d prefixlen 64 scopeid 0x20<link> ether 00:0c:29:f4:ef:5d txqueuelen 1000 (Ethernet) RX Packets 317168 bytes 315273916 (300.6 MiB) RX errors 0 Dropped 0 Overruns 0 Frame 0 TX packets 149675 bytes 14400069 (13.7 MiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0 LO: Flags =73<UP,LOOPBACK,RUNNING> MTU 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixLen 128 scopeid 0x10<host> loop Txqueuelen 1 (Local Loopback) RX packets 12826 bytes 3163428 (3.0 MiB) RX errors 0 Dropped 0 Overruns 0 frame 0 TX Packets 12826 bytes 3163428 (3.0 MiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0Copy the code

If the IP address cannot be queried, configure the network

cd/etc/sysconfig/network-scripts/ vi ifcfg-ens33 (my VM version, other versions may vary) ONBOOT="yes"Table starts the network.Copy the code
  1. Set the network alias, that is, the alias corresponding to the IP address. For example:hdfs://master:9000
[root@localhost network-scripts]# vi /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4. Localdomain4 ::1 localhost localhost.localdomain Localhost6 localhost6. Localdomain6 192.168.6.133 Master 192.168.6.131 slave1 192.168.6.132 slave2Copy the code

Restart the network

Service network restart Restarts the networkCopy the code

Then you can try ping master/slave1/slave2. If the ping succeeds, the configuration is successful

  1. Format HDFS and run the following command on each node
hdfs namenode -format
Copy the code

Format before startup. If there is no error or Exception, the format is successful

6. Configure the Hadoop cluster node for the master host

cd/ home/MMCC hadoop2.7.3 / etc/hadoop [root @ localhost hadoop]# vi slaves // Add the following content slave1 slave2Copy the code

7. Disable the firewall on each node and start the HDFS service.

[root@localhost mmcc]# systemctl stop firewalld
[root@localhost mmcc]# hadoop-daemon.sh start namenode
[root@localhost mmcc]Sh start datanode // slave nodes slave1, slave2
Copy the code

Then you can enter the master node address master:50070 or IP address :50070 on the web page to view the current status and node status oh. At this point a distributed Hadoop environment has been successfully started. In the next section, you will learn how to perform secret free login, one-click cluster startup, and some simple HDFS file storage commands. If there is any problem during configuration, you can check the log to troubleshoot the problem. Welcome to add my wechat to learn and make progress together