preface

Last time we talked about installing the JDK and Hadoop, let’s actually build a cluster and run!

Cluster Building Scheme

  1. Directly above:

  2. Prepare 3 clients (turn off the firewall and change the host name, this step was dealt with last time, it seems to forget to change the host name, this time do it!

  3. Clone VM 192.168.2.132 and 192.168.2.133

  4. Configure the IP address (192.168.2.132, 192.168.2.133).

    CD /etc/sysconfig/network-scripts/ # edit the file vim ifcfg-ens33 # edit the IP address IPADDR=192.168.2.132 # save :wq # restart systemctl restart networkCopy the code
  5. Configure host(192.168.2.132, 192.168.2.133)

    Hostnamectl set-hostname hadoop131 check whether the configuration is successful hostName edit host file vim /etc/hosts host 192.168.2.131 hadoop131Copy the code
  6. Install JDK, Hadoop (this step was done last time)

  7. Configure environment variables (this step was done last time)

  8. Configure SSH

  9. SSH enables direct communication between hadoOP131, Hadoop132, and hadoop133. The principle is as follows:

  10. The configuration is as follows:

    Ssh-keygen -t rsa SSH /id_rsa.pub hadoop131 ssh-copy-id -i ~/. SSH /id_rsa.pub hadoop132 ssh-copy-id -i ~ /. SSH/id_rsa. Pub hadoop133 # test SSH hadoop132 # above operation in hadoop131 respectively, hadoop132, hadoop133 operation againCopy the code
  11. Group shell scripts

  12. To copy files to the same directory on all nodes, we need to use the Rsync command to process the original copy:

    Rsync -- RVL [original path] [Target user @ Target server: target path]Copy the code
  13. The shell script is as follows: Name xsync and save it to the /usr/local/bin directory.

    #! /bin/bash pcount=$# if((pcount==0)); then echo no args; exit; fi p1=$1 fname=`basename $p1` echo fname=$fname pdir=`cd -P $(dirname $p1); pwd` echo pdir=$pdir user=`whoami` for((host=132; host<134; host++)); do echo --------------- hadoop$host ---------------- rsync -rvl $pdir/$fname $user@hadoop$host:$pdir doneCopy the code
  14. Configure the cluster file as I did for the time being, and we’ll talk about it in detail later:

  15. Let’s start with the location of all configuration files, as shown below: all here!

  16. Core configuration file core-site.xml

    <configuration> <! -- <property> <name>fs.defaultFS</name> <value> HDFS :// hadoOP131:9000 </value> </property> <! <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.7.7/data/ TMP </value> </property> <! - the current user is set to full root - > < property > < name > hadoop. HTTP. Staticuser. User < / name > < value > root < value > / < / property > <! DFS. Permissions. Enabled </name> <value> False </value> </property> </configuration>Copy the code
  17. HDFS configuration files hadoop-env.sh and hdFs-site.xml

  18. hadoop-env.sh

    JDK export JAVA_HOME=/opt/module/jdk1.8.0_181Copy the code
  19. hdfs-site.xml

    <configuration> <! Replication </name> <value>3</value> </property> <! - name of auxiliary nodes host configuration - > < property > < name > DFS. The namenode. Secondary. HTTP - address < / name > < value > hadoop133:50090 value > < / < / property > <! Address </name> <value>0.0.0.0:50070</value> </property> </configuration>Copy the code
  20. Yarn Configuration file yarn-env.sh and yarn-site. XML files

  21. yarn-env.sh

    JDK JAVA_HOME=/opt/module/jdk1.8.0_181Copy the code
  22. yarn-site.xml

    <configuration> <! -- Reduce obtain data --> <property> <name> yarn.nodeManager. aux-services</name> <value>mapreduce_shuffle</value> </property> <! - the address of the resource manager - > < property > < name > yarn. The resourcemanager. The hostname < / name > < value > hadoop132 < value > / < / property > <! -- Log aggregation --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <! -- Log retention for 7 days --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> <! -- Optional, Pmem-check-enabled </name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>Copy the code
  23. MapReduce configuration files mapred-env.sh and mapred-site.xml

  24. mapred-env.sh

    JDK export JAVA_HOME=/opt/module/jdk1.8.0_181Copy the code
  25. mapred-site.xml

    <configuration> <! -- Specify mapReduce to run on YARN --> <property> <name> mapReduce.framework. name</name> <value> YARN </value> </property> <! Server address - history - > < property > < name > graphs. The jobhistory. Address < / name > < value > hadoop131:10020 < value > / < / property > <! -- history both server web address - > < property > < name > graphs. The jobhistory. Webapp. Address < / name > < value > hadoop131:19888 < value > / < / property > <! - can not configure ha - > < property > < name > graphs. The application. The classpath < / name > < value > / opt/module/hadoop - 2.7.7 / share/hadoop/graphs / *, / opt/module/hadoop - 2.7.7 / share/hadoop/graphs/lib / * < value > / < / property > < / configuration >Copy the code
  26. Slaves Group configuration file Slaves

    Hadoop131 hadoop132 hadoop133Copy the code
  27. Synchronizing configuration Files

    Sh xsync/opt/module/hadoop – 2.7.7 / etc/hadoop

  28. Rose up

    Sh # Start yarn # Start HDFS on hadoop131 start-dfs.sh # Start yarn On hadoop132 must be on hadoop132 because manager starts history daemon on start-yarn.sh # Mr-jobhistory -daemon.sh start historyServer

  29. test

  30. http://hadoop131:50070/dfshealth.html#tab-overview

  31. http://hadoop132:8088/cluster

  32. http://hadoop131:19888/jobhistory/app

  33. Now you have a small Hadoop cluster! Like yourself 👍

conclusion

I want to learn Hadoop on a whim. In order to urge my summary habit, I publish a summary to the nuggets every week. I hope I can persevere and the road to success is full of thorns.