Learning notes Hadoop basics

One, foreword

This article uses version 2.7.3 of Hadoop

The official example wordcount

1. Decompress hadoop-2.3.7.tar. gz to /opt/app/

Tar -zxvf hadoop-2.1.7.tar. gz -c /opt/app/Copy the code

2. Check out the use of wordcount

/ opt/app/hadoop - 2.7.3 / bin/hadoop jar/opt/app/hadoop - 2.7.3 / share/hadoop/graphs/hadoop - graphs - examples - 2.7.3. Jar Wordcount Wordcount Usage: Wordcount <in> [<in>...]  <out>Copy the code

This example is to count the number of occurrences of words in a file. There can be multiple statistics files without manually creating an output folder.

3. Prepare statistical files

A. Under /opt/app/hadoop-2.7.3/, create input and copy the prepared file to it.

B. Run the command

/ opt/app/hadoop - 2.7.3 / bin/hadoop jar/opt/app/hadoop - 2.7.3 / share/hadoop/graphs/hadoop - graphs - examples - 2.7.3. Jar wordcount /input /outputCopy the code

C. View the output

In the output directory, there are two files _SUCCESS and part-r-00000. Run the cat command to view part-r-00000

Hadoop pseudo-distributed configuration

Reference documentation

1. Environment preparation

A. Disable the firewall

Systemctl stop firewalld systemctl disable firewalld.service /etc/selinux/config SELINUX=disabled ...Copy the code

B. Configure the /etc/hosts file

C, configure JDK and Hadoop environment, /etc/profile

2. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-core-site.xml

<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> <property> <! -- Hadoop file system depends on the basic configuration, default is TMP /{$user}, Dir </name> <value>/opt/app/hadoop-2.7.3/data/ TMP </value> </property>Copy the code

3. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-hdfs-site.xml

<property>
     <name>dfs.replication</name>
     <value>1</value>
</property>
Copy the code

4. Format HDFS

hdfs namenode -format
Copy the code

The purpose is to generate the unique identifier of the cluster, the unique identifier of the block pool, the storage path of the Namenode process management content (Fsimage). If the following information is displayed, the formatting is successful:

5. Start NameNode and DataNode

/opt/app/hadoop-2.7.3/sbin/hadoop-daemon.sh start namenode
/opt/app/hadoop-2.7.3/sbin/hadoop-daemon.sh start datanode
Copy the code

6. Check whether the startup is successful

Access localhost: 50070

4. Configure YARN

1. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-mapred-site.xml

Change mapred-site.xml.template to mapred-site.xml

<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
Copy the code

2. Configure /opt/app/hadoop-2.7.3/etc/hadoop/yarn-site.xml

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
Copy the code

3. Start Resourcemanager and NodeManager

Sh start resourcemanager /opt/app/hadoop-2.7.3/sbin/yarn-daemon.sh start resourcemanager /opt/app/hadoop-2.7.3/sbin/yarn-daemon.sh start nodemanagerCopy the code

4. Check whether the startup is successful

A. Access localhost:8088

B. View the startup service

C, if JPS command error: “bash: JPS: found command…”

vi /root/.bash_profile
Copy the code

5. Run mapReduce to analyze HDFS files using YARN

1. Upload the file to be analyzed to the HDFS

Create an input folder on the HDFS /opt/app/hadoop-2.7.3/bin/ HDFS dfs-mkdir /input Upload the file to the HDFS /opt/app/hadoop-2.7.3/bin/ HDFS dfs-put Wordcount /input Check whether the file is successfully uploaded /opt/app/hadoop-2.7.3/bin/ HDFS dfS-ls /inputCopy the code

2. Run the command

/ opt/app/hadoop - 2.7.3 / bin/hadoop jar/opt/app/hadoop - 2.7.3 / share/hadoop/graphs/hadoop - graphs - examples - 2.7.3. Jar wordcount /input /outputCopy the code

3. View the results

/ opt/app/hadoop - 2.7.3 / bin/HDFS DFS - cat/output/part - r - 00000Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

One, foreword

The official example wordcount

1. Decompress hadoop-2.3.7.tar. gz to /opt/app/

2. Check out the use of wordcount

3. Prepare statistical files

Hadoop pseudo-distributed configuration

1. Environment preparation

2. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-core-site.xml

3. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-hdfs-site.xml

4. Format HDFS

5. Start NameNode and DataNode

6. Check whether the startup is successful

4. Configure YARN

1. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-mapred-site.xml

2. Configure /opt/app/hadoop-2.7.3/etc/hadoop/yarn-site.xml

3. Start Resourcemanager and NodeManager

4. Check whether the startup is successful

5. Run mapReduce to analyze HDFS files using YARN

1. Upload the file to be analyzed to the HDFS

2. Run the command

3. View the results

4. Log in to localhost:50070 in the browser to view HDFS

5. Access localhost:8088 in the browser to view the mapReduce running records on YARN

Learning notes Hadoop basics

One, foreword

The official example wordcount

1. Decompress hadoop-2.3.7.tar. gz to /opt/app/

2. Check out the use of wordcount

3. Prepare statistical files

Hadoop pseudo-distributed configuration

1. Environment preparation

2. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-core-site.xml

3. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-hdfs-site.xml

4. Format HDFS

5. Start NameNode and DataNode

6. Check whether the startup is successful

4. Configure YARN

1. Configure /opt/app/hadoop-2.7.3/etc/ hadoop-mapred-site.xml

2. Configure /opt/app/hadoop-2.7.3/etc/hadoop/yarn-site.xml

3. Start Resourcemanager and NodeManager

4. Check whether the startup is successful

5. Run mapReduce to analyze HDFS files using YARN

1. Upload the file to be analyzed to the HDFS

2. Run the command

3. View the results

4. Log in to localhost:50070 in the browser to view HDFS

5. Access localhost:8088 in the browser to view the mapReduce running records on YARN

Related Posts

As a programmer, are you afraid of getting old?

SpringBoot integration with SpringSecurity makes authentication and authorization easier than ever

Quickly create project templates based on existing projects