One, foreword

This article uses version 2.7.3 of Hadoop

The official example wordcount

1. Decompress hadoop-2.3.7.tar. gz to /opt/app/
Tar -zxvf hadoop-2.1.7.tar. gz -c /opt/app/Copy the code
2. Check out the use of wordcount
/ opt/app/hadoop - 2.7.3 / bin/hadoop jar/opt/app/hadoop - 2.7.3 / share/hadoop/graphs/hadoop - graphs - examples - 2.7.3. Jar Wordcount Wordcount Usage: Wordcount <in> [<in>...]  <out>Copy the code

This example is to count the number of occurrences of words in a file. There can be multiple statistics files without manually creating an output folder.

3. Prepare statistical files

A. Under /opt/app/hadoop-2.7.3/, create input and copy the prepared file to it.

B. Run the command

/ opt/app/hadoop - 2.7.3 / bin/hadoop jar/opt/app/hadoop - 2.7.3 / share/hadoop/graphs/hadoop - graphs - examples - 2.7.3. Jar wordcount /input /outputCopy the code

C. View the output

In the output directory, there are two files _SUCCESS and part-r-00000. Run the cat command to view part-r-00000

Hadoop pseudo-distributed configuration

Reference documentation

1. Environment preparation

A. Disable the firewall

Systemctl stop firewalld systemctl disable firewalld.service /etc/selinux/config SELINUX=disabled ...Copy the code

B. Configure the /etc/hosts file

C, configure JDK and Hadoop environment, /etc/profile

2. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <! -- Hadoop file system depends on the basic configuration, default is TMP /{$user}, Dir </name> <value>/opt/app/hadoop-2.7.3/data/ TMP </value> </property>Copy the code
3. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-hdfs-site.xml
<property>
     <name>dfs.replication</name>
     <value>1</value>
</property>
Copy the code
4. Format HDFS
hdfs namenode -format
Copy the code

The purpose is to generate the unique identifier of the cluster, the unique identifier of the block pool, the storage path of the Namenode process management content (Fsimage). If the following information is displayed, the formatting is successful:

5. Start NameNode and DataNode
/opt/app/hadoop-2.7.3/sbin/hadoop-daemon.sh start namenode
/opt/app/hadoop-2.7.3/sbin/hadoop-daemon.sh start datanode
Copy the code
6. Check whether the startup is successful

Access localhost: 50070

4. Configure YARN

1. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-mapred-site.xml

Change mapred-site.xml.template to mapred-site.xml

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
Copy the code
2. Configure /opt/app/hadoop-2.7.3/etc/hadoop/yarn-site.xml
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
Copy the code
3. Start Resourcemanager and NodeManager
Sh start resourcemanager /opt/app/hadoop-2.7.3/sbin/yarn-daemon.sh start resourcemanager /opt/app/hadoop-2.7.3/sbin/yarn-daemon.sh start nodemanagerCopy the code
4. Check whether the startup is successful

A. Access localhost:8088

B. View the startup service

C, if JPS command error: “bash: JPS: found command…”

vi /root/.bash_profile
Copy the code

5. Run mapReduce to analyze HDFS files using YARN

1. Upload the file to be analyzed to the HDFS
Create an input folder on the HDFS /opt/app/hadoop-2.7.3/bin/ HDFS dfs-mkdir /input Upload the file to the HDFS /opt/app/hadoop-2.7.3/bin/ HDFS dfs-put Wordcount /input Check whether the file is successfully uploaded /opt/app/hadoop-2.7.3/bin/ HDFS dfS-ls /inputCopy the code

2. Run the command
/ opt/app/hadoop - 2.7.3 / bin/hadoop jar/opt/app/hadoop - 2.7.3 / share/hadoop/graphs/hadoop - graphs - examples - 2.7.3. Jar wordcount /input /outputCopy the code
3. View the results
/ opt/app/hadoop - 2.7.3 / bin/HDFS DFS - cat/output/part - r - 00000Copy the code

4. Log in to localhost:50070 in the browser to view HDFS

5. Access localhost:8088 in the browser to view the mapReduce running records on YARN