1. Single input format

Specify the input format of ParquetInputFormat

Job.setMapPerClass (ParquetMap.class); job.setInputFormatClass(ParquetInputFormat.class); ParquetInputFormat.addInputPath(job, new Path(args[1])); ParquetInputFormat.setReadSupportClass(job, CheckLevelRunner.MyReadSupport.class); Public static final class MyReadSupport extends DelegatingReadSupport<Group> {public DelegatingReadSupport () {}  super(new GroupReadSupport()); } @Override public org.apache.parquet.hadoop.api.ReadSupport.ReadContext init(InitContext context) { return super.init(context); }} static class ParquetMap extends Mapper<Void, Group, Text, Text> {protected Void map(Void key, String, String, String, String, String, String); Group value, Mapper<Void, Group, Text, Context Context) {try {String md5sha1=value.getString("key1", 0); Write (new Text(outputKey), new Text(outputValue)); } catch (Exception e) { return; }}}

Parse if Parquet encounters empty file:



You can set the MapReduce fault-tolerant parameters at this point:

Graphs. Map. Failures. Maxpercent: this parameter when the map Task failure rate more than the value, the entire operation failed, the default value is 0. Set to 5 here, the number of task maps is the same as the number of input files, so if the number of empty files is less than 5%, the task succeeds, and if it is greater than 5%, the task fails.

job.getConfiguration().set("mapreduce.map.failures.maxpercent", "5");

2. Multiple input formats

One directory has a file format of TEXT and the other is PARQUET. Process data using multipleInputs with multiple maps set according to the input source.

/ / set multiple input, multiple mapper MultipleInputs. AddInputPath (job, new Path (path1), TextInputFormat. Class, NormalMap. Class); MultipleInputs.addInputPath(job, new Path(path2), ParquetInputFormat.class, ParquetMap.class); ParquetInputFormat.setReadSupportClass(job, CheckLevelRunner.MyReadSupport.class);

3. Problems encountered in calling the HTTP interface in MapReduce

Will be deployed server program, found that the wrong will quote: the Exception in the thread “is the main” Java. Lang. NoSuchFieldError: The org.apache.hadoop package contains the version 3.1 of HttpClient.http: / / httpclient.org.apache. hadoop package contains the version 3.1 of HttpClient.http: / / httpclient.org.apache. hadoop package contains the version 3.1 of HttpClient. The 3.1 HttpClient in the Hadoop package is used.

Refer to the article

https://www.cnblogs.com/EnzoD…

https://blog.csdn.net/woloqun…

https://blog.csdn.net/csdnmrl…