A lot of this is from the first time I wrote MapReduce. It was followed up at the time. After I became more familiar with YARN, it was nothing. However, I think there will be many beginners may have the same question, specifically to write.

Always start local

If you use an older package (1.x) or a compatible package provided by the CDH (for example, 2.0.0-MR1-CDH4.3.1, where MR1 stands for 1.x), you may not be able to submit the job remotely and remain local. Because they rely on hadoop-core instead of Hadoop-Mapreduce-client-core — it seems that mapred.job.tracker can be configured to commit remotely, but I’m not sure if it will show up in YARN records.

Cannot initialize Cluster

This is usually because the cluster configuration is not filled in, but it is also possible that the package is not installed. There is no implementation for the ClientProtocolProvider:

static Iterable<ClientProtocolProvider> frameworkLoader = ServiceLoader.load(ClientProtocolProvider.class);
Copy the code

The code here is injected via SPI, Such as hadoop – graphs – the client – jobclient Meta-inf/services/org. Apache. Hadoop. Graphs. The protocol. You can see in the ClientProtocolProvider, It is configured with YarnClientProtocolProvider the implementation class. Therefore, if hadoop-Mapreduce-client-JobClient is not correctly installed, the preceding problems may also occur.

ClassNotFoundException

This is in the org. Apache. Hadoop. Conf. Configuration. In getClass occurs. I’ve had a lot of interesting scenarios before, like this one pointing to a weak reference set, and I’m wondering if it’s memory reclamation; Then suspect a classLoader problem on slave; A final look at the YARN staging folder shows that the JAR package does not exist.

In other words, you must add:

conf.set("mapreduce.job.jar"."target/xxx.jar");
Copy the code

And the JAR package must be typed in advance.

Job.setjarbyclass (xxx.class) is set to job.setjarbyClass. In fact, if we move the JAR to the server must be able to run, because of this setting. But if you run it as Application, the code itself is not packaged, so you can’t find the corresponding jar (you can check if (“jar”.equals(url.getProtocol()))).

I read that there are no errors when using plug-ins in Eclipse, but the plug-ins do the automatic packaging. IDEA can be automatically packaged by adding Maven’s Package task to the pre-task in the Run Configuration. You can also use the Artifacts feature of IDEA to wrap jars, which can even be run directly (including the manifest.mf).