In this paper, the environment

Windows 10
JDK 8
IntelliJ IDEA 2019.3.4(Community Edition)
Hadoop 2.8.5
AWS EMR 5.3.0

Stand-alone program

  • newMavenengineering

  • Modify the pom.xml configuration
<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > The < modelVersion > 4.0.0 < / modelVersion > < groupId > mapreducedemo < / groupId > < artifactId > mapreducedemo < / artifactId > <version>1.0- Snapshot </version> <properties> 2.8.5</ Hadoop. Version > </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies> </project>
  • newPackageJava Class

  • fromThe officialCopy the code toWordCount.java
package mapreducedemo; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0:1); }}
  • To a text filedemo.txtIn thepom.xmlAt the same directory
  • Adding a run configuration

  • Click Run and report the following error
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String; I)Z
  • towebsiteDownload and unzip itHadoop - 2.8.5. Tar. Gz

  • downloadWinutils/hadoop – 2.8.5 /That will bebinCopy the files in the directoryHadoop - 2.8.5 / bin(copy or replace)
  • Set environment variables (Qbit is set directly in IDEA in order not to conflict with system environment variables)

  • Rerun and enter results normally.

Deployed to the AWS

  • inpom.xmladdbuildLabel and content, completepom.xmlThe following
<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > The < modelVersion > 4.0.0 < / modelVersion > < groupId > mapreducedemo < / groupId > < artifactId > mapreducedemo < / artifactId > < version > 1.0 - the SNAPSHOT < / version > < properties > < Java version > 1.8 < / Java version > <. Hadoop version > 2.8.5 < / hadoop version > </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> . < groupId > org, apache maven plugins < / groupId > < artifactId > maven - compiler - plugin < / artifactId > < version > 3.8.1 < / version > <configuration> <source>${java.version}</source> <target>${java.version}</target> </configuration> </plugin> <plugin> . < groupId > org, apache maven plugins < / groupId > < artifactId > maven - shade - plugin < / artifactId > < version > 3.2.4 < / version > <configuration> <createDependencyReducedPom>false</createDependencyReducedPom> </configuration> <executions> <execution>  <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>mapreducedemo.WordCount</mainClass> </transformer> </transformers> <artifactSet> <excludes> <exclude>junit:junit</exclude> </excludes> </artifactSet> </configuration> </execution> </executions> </plugin> </plugins> </build> </project>
  • Adding a run configuration

  • Run and exportjar
  • willdemo.txtCopy totargetdirectory

  • cmdThe command line test is OK
. \ mapreducedemo \ target > set HADOOP_HOME = E: \ hadoop \ hadoop - 2.8.5... \ mapreducedemo \ target > set PATH = E: \ hadoop \ hadoop - 2.8.5 \ bin; %PATH% ... \ mapreducedemo \ target > Java jar mapreducedemo - 1.0 - the SNAPSHOT. Jar demo. TXT output log4j: WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system Properly. Log4j: WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
  • Upload the fileS3
S3: / / zt - hadoop - cn - / jars/mapreducedemo northwest - 1-1.0 - the SNAPSHOT. Jar s3: / / zt - hadoop - cn - northwest - 1 / usr/qbit/input/idea. TXT
  • createEMRClustering (default configuration)

  • Add steps (output directory should not exist)



  • Wait for program to run



  • inS3View the run resultss3://zt-hadoop-cn-northwest-1/usr/qbit/output

This article from the
qbit snap