Run Hadoop on Windows and deploy to AWS (QBIT)

In this paper, the environment

Windows 10
JDK 8
IntelliJ IDEA 2019.3.4(Community Edition)
Hadoop 2.8.5
AWS EMR 5.3.0

Stand-alone program

newMavenengineering

Modify the pom.xml configuration

<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > The < modelVersion > 4.0.0 < / modelVersion > < groupId > mapreducedemo < / groupId > < artifactId > mapreducedemo < / artifactId > <version>1.0- Snapshot </version> <properties> 2.8.5</ Hadoop. Version > </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies> </project>

newPackage 和 Java Class

fromThe officialCopy the code toWordCount.java

package mapreducedemo; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0:1); }}

To a text filedemo.txtIn thepom.xmlAt the same directory
Adding a run configuration

Click Run and report the following error

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String; I)Z

towebsiteDownload and unzip itHadoop - 2.8.5. Tar. Gz

downloadWinutils/hadoop – 2.8.5 /That will bebinCopy the files in the directoryHadoop - 2.8.5 / bin(copy or replace)
Set environment variables (Qbit is set directly in IDEA in order not to conflict with system environment variables)

Rerun and enter results normally.

Deployed to the AWS

inpom.xmladdbuildLabel and content, completepom.xmlThe following

<? The XML version = "1.0" encoding = "utf-8"? > < project XMLNS = "http://maven.apache.org/POM/4.0.0" XMLNS: xsi = "http://www.w3.org/2001/XMLSchema-instance" Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" > The < modelVersion > 4.0.0 < / modelVersion > < groupId > mapreducedemo < / groupId > < artifactId > mapreducedemo < / artifactId > < version > 1.0 - the SNAPSHOT < / version > < properties > < Java version > 1.8 < / Java version > <. Hadoop version > 2.8.5 < / hadoop version > </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> </dependencies> <build> <plugins> <plugin> . < groupId > org, apache maven plugins < / groupId > < artifactId > maven - compiler - plugin < / artifactId > < version > 3.8.1 < / version > <configuration> <source>${java.version}</source> <target>${java.version}</target> </configuration> </plugin> <plugin> . < groupId > org, apache maven plugins < / groupId > < artifactId > maven - shade - plugin < / artifactId > < version > 3.2.4 < / version > <configuration> <createDependencyReducedPom>false</createDependencyReducedPom> </configuration> <executions> <execution>  <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>mapreducedemo.WordCount</mainClass> </transformer> </transformers> <artifactSet> <excludes> <exclude>junit:junit</exclude> </excludes> </artifactSet> </configuration> </execution> </executions> </plugin> </plugins> </build> </project>

Adding a run configuration

Run and exportjar
willdemo.txtCopy totargetdirectory

cmdThe command line test is OK

. \ mapreducedemo \ target > set HADOOP_HOME = E: \ hadoop \ hadoop - 2.8.5... \ mapreducedemo \ target > set PATH = E: \ hadoop \ hadoop - 2.8.5 \ bin; %PATH% ... \ mapreducedemo \ target > Java jar mapreducedemo - 1.0 - the SNAPSHOT. Jar demo. TXT output log4j: WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system Properly. Log4j: WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Upload the fileS3

S3: / / zt - hadoop - cn - / jars/mapreducedemo northwest - 1-1.0 - the SNAPSHOT. Jar s3: / / zt - hadoop - cn - northwest - 1 / usr/qbit/input/idea. TXT

createEMRClustering (default configuration)

Add steps (output directory should not exist)

Wait for program to run

inS3View the run resultss3://zt-hadoop-cn-northwest-1/usr/qbit/output

This article from the
qbit snap

Run Hadoop on Windows and deploy to AWS (QBIT)

In this paper, the environment

Stand-alone program

Deployed to the AWS

Related Posts

(4)Hadoop MapReduce practical examples

Hadoop Study Note 3: A Shuffle diagram for MapReduce