On Windows installation guide | avoid spark encountered some pit pit

Recently, I was given an Aliyun desktop (Windows 10) to write something on with Scala + Spark.

You keep getting errors and basic logic doesn’t work. After a long time of debugging, I concluded that every time IDEA did not clearly indicate the wrong line in my script, it was because the version was not compatible. Don’t hesitate to check the version.

The inspection involves:

  • The basicjavaEnvironment: 1.8.0 comes with _201
  • languagescalaVersion: 2.12.13
  • hadoopVersion: 2.6.4
  • sparkVersion: from 2.4.8

Level 1: Smaller versions of Scala are incompatible

Although we can see on the Spark website that Spark 2.4.8 will do with Scala version 2.12.x, the error is as follows.

. java.lang.NoSuchMethodError: scala.Predef$.refArrayOps ...

I had just started using Scala myself and thought something was wrong with my own program. Boy that was looking for wow, finally thanks “running spark in Java. Lang. NoSuchMethodError: scala. Predef $. RefArrayOps”, let I realized maybe change the scala version can try.

The article Suggestions above in mvnrepository.com/artifact/org.apache.spark/spark-core above version compatibility, I feel insecure. Here are two methods I recommend, both of which I have developed on my own:

Method 1: Open our command line (CMD or powershell will do), type spark-shell, and see what version of Scala our local installation of Spark uses. The diagram below.

Step 2: Go to the Spark installation path and see what version of Scala it depends on. The diagram below.

I uninstalled Scala 2.12.13 and went to the Scala website to download the Scala 2.11.12 version.

Level 2: Hadoop version

Good boy, the previous error is not reported, now is a new error.

Mistake # 1:... java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows. ... Mistake # 2:... Error: graphs 】 【 Java. Lang. UnsatisfiedLinkError: org.. Apache hadoop. IO. Nativeio...

After referring to both articles, I didn’t get much direct help, but I realized that maybe the Hadoop version was wrong:

  • java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.
  • Error: graphs 】 【 Java. Lang. UnsatisfiedLinkError: org.. Apache hadoop. IO. Nativeio

After all, our Spark installation package is called Spark-2.4.8-bin-Hadoop 2.7, so we have to install Hadoop 2.7 anyway.

Hadoop 2.7.3 instead. There’s no problem.

Level 3: Hadoop. DLL

I thought I was done. Until I used a method:

result.coalesce(1).write.mode(SaveMode.Overwrite).csv(outputPath)

I want to use Spark to write the data to disk.

. java.io.IOException: (null) entry in command string: null chmod 0644 ...

Again, I thought it was my lack of understanding of the API that caused the problem, so I changed it for a long time.

Until I remembered that every time IDEA didn’t specify a faulty line in my script, it was because of version incompatibilities. I have no problem with the version now, but there may be other parts of the match. At the very least, there’s a good chance that you didn’t write the code yourself.

Java.io. ioException: (NULL) Entry in command string: NULL chmod 0644 Null chmod 0644 “”), I install laoge guidance in github.com/4ttty/winutils/blob/master/hadoop-2.7.1/bin download the dynamic link library hadoop. DLL, C:/Windows/System32

My version was 2.7.3, but I downloaded version 2.7.1 (no 2.7.3), pray: small version compatible, small version compatible. It’s finally working.

conclusion

I don’t know what version issues we will encounter in the future.

The bottom line is that I am extremely dependent on Spark and unfamiliar with its ecology, and if I hear a Python error, I can immediately tell if it’s my code or the environment. So it’s important to get a systematic look at Spark. Learning systematic knowledge will make my work more efficient with half the effort.

Also, if you can use Linux, use Linux as much as possible. Match Docker well. Who wants to use the direct pull mirror, how much work is left? But then again, I would not have learned so much if I had been directly helped by the environment.

I want to build a Spark/Scala/Hadoop/Big Data Technology stack group. If you want to join us, add me WeChat Piperlhj.

Brother don’t forget to pay attention, thumb up.