Through the article the hadoop trip 5 – the idea through the maven build HDFS environment, believe that everyone can do visit HDFS hadoop file system on the idea of development. A cloud disk can be based on such a system. If you’re interested, you can try it yourself.

Today we take you in the local implementation of Mapreduce, the number of words for statistics, generally used for debugging. Online mode is also very simple, just need to open a JAR package, online service throughHadoop jar XXXX. jar Package name + classThe command can be executed, can explore

MapReduce

MapReduce is a programming model for parallel computation of large data sets (larger than 1TB). It is also a core component of Hadoop, HDFS, a distributed file system, and MapReduce, a distributed computing framework. Designed for computing

The simplest flow chart

  1. Read the file
  2. The map operation
  3. Reduce the operating
  4. Output result file

Detailed process diagram

Begin to implement

1. The first

After preparing an input file input. TXT in any directory of the project, I created the SpringBoot project in map_input under Resources

c	2
c++	1
java	2
python	1
Copy the code

2. Import dependencies

<properties>
	<hadoop.version>2.7.3</hadoop.version>
</properties>

<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-common</artifactId>
	<version>${hadoop.version}</version>
</dependency>
<dependency>
	<groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-client</artifactId>	
	<version>${hadoop.version}</version>
</dependency>
<dependency>
        <groupId>org.apache.hadoop</groupId>
	<artifactId>hadoop-hdfs</artifactId>
	<version>${hadoop.version}</version>
</dependency>
Copy the code

3. Prepare your own map terminal

Implementing the Map side is actually quite simple. Just create a class that inherits the Mapper class and implements its Map methods.

Public class MyMapper extends Mapper<Object, Text, Text, LongWritable> {/** ** @param key * @param text specifies the value of the current line * @param context specifies the content of the file:  * java,c++ * c,java * python,c */ @Override protected void map(Object key, Text text, Context Context) throws IOException, InterruptedException {// Get the current input. String line = text.toString(); [] lines = line.split(",");
        for*/ context.write(new Text(word),new LongWritable(1)); }}}Copy the code

4. Implement the Reduce end

Similarly, inherit the Reducer class and implement the Reduce method

/** * The first two parameters: reducer input parameters, i.e. keys output from map, set of values (same keys will be merged together) * The last two parameters: reducer input parameters Reducer output keys, Value (that is, the final result) * in the reduce accumulation can be counted the number of each word * / public class MyReduce extends Reducer < LongWritable Text, Text, LongWritable > { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {/** * key,values * Uses the key value output in the map. * C ++:[1] * Java :[1,1] Java :[2] * python:[1] */ long sum = 0; // The total number of occurrences of the wordfor(LongWritable value : values) { sum+=value.get(); } context.write(key,new LongWritable(sum)); // Output results such as c:2, Java :1... }}Copy the code

5. Implement the client code

The calling code is pretty generic, basically the same template

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); //conf. Set ()"fs.defaultFS"."hdfs://master:9000/"); // master access path //conf.set("mapreduce.framework.name"."yarn"); // run yarn //conf.set("yarn.resourcemanager.hostname"."master"); // Set host Job Job = job.getinstance (conf); // Set the main class to run job.setJarbyClass (mapreduceclient.class); job.setJobName("wordCount"); / / set application name / / set the position of the input file FileInputFormat addInputPaths (job,"J:\\IDEA\\springboot-hadoop\\src\\main\\resources\\map_input"); / / set the position of the output file FileOutputFormat. SetOutputPath (job, new Path ("J:\\IDEA\\springboot-hadoop\\src\\main\\resources\\map_output")); Mapper and reducer job. SetMapperClass (mymapper.class); job.setReducerClass(MyReduce.class); Map.setoutputkeyclass (text.class); // Set input and output type (map and reduce are the same). job.setOutputValueClass(LongWritable.class); / / perform the job. WaitForCompletion (true);

    }
Copy the code

To run it locally, you need to have the Windows executable file in the bin directory of local Hadoop. You can simply unzip Hadoop and copy the Windows execution package to the bin directory

The final result is as follows