Problem description

The current project under development requires isolation of the Spark environment, so a custom implementation of the SparkClassLoader is implemented. When the SparkSession is actually packaged to run on the server, the application needs to initialize the SparkSession, but the following error is reported:

20/04/09 16:57:02 ERROR SparkContext: Error initializing SparkContext.
java.lang.ClassCastException: org.apache.spark.serializer.JavaSerializer cannot be cast to org.apache.spark.serializer.Serializer
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:295)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:187)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2523)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
    at org.apache.spark.sql.SparderContext$$anonfun$initSpark$1$$anon$4.run(SparderContext.scala:128)
    at java.lang.Thread.run(Thread.java:748)

To solve the process

  1. Look at the two class-loaded JARs with the jvm-verbose :class parameter to see if there is a problem with package conflicts
[Loaded org.apache.spark.serializer.Serializer from File: / root/wangrupeng/spark/jars/spark - core_2. - against 2.4.1 - OS - 11 kylin - r3. Jar] [the Loaded Org. Apache. Spark. Serializer. JavaSerializer the from file: / root/wangrupeng/spark/jars/spark - core_2. - against 2.4.1 - OS - 11 kylin - r3. Jar]

It turns out that both classes were loaded from the same JAR package, ruling out the cause of the dependency conflict.

  1. With Alibaba’s powerful tool, Arthas

The official website

Curl - O https://alibaba.github.io/arthas/arthas-boot.jar Java - jar arthas - boot. Jar # enter the process ID sc this project # found that we have two instances of custom this sc - d org. Apache. The spark. Serializer. JavaSerializer # no accident, change classes are two examples of this loaded twice

ClasscastException is caused by two classes being loaded twice by two classloaders and then classcast by two different classloaders

  1. Why is classloader initialized twice?

Since the SparkClassLoader is self-defined, I print the Stack information in its constructor so that I can see the initialization of the class instance

protected SparkClassLoader(ClassLoader parent) throws IOException {
        super(new URL[] {}, parent);
        init();
        Thread.dumpStack();
    }

The relevant output in the final log is as follows:

java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1336) at org.apache.kylin.spark.classloader.DebugTomcatClassLoader.<init>(DebugTomcatClassLoader.java:75) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.catalina.loader.WebappLoader.createClassLoader(WebappLoader.java:753) at org.apache.catalina.loader.WebappLoader.startInternal(WebappLoader.java:598) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5581) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1016) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:992) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:639) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1127) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:2020) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)  java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1336) at org.apache.kylin.spark.classloader.DebugTomcatClassLoader.<init>(DebugTomcatClassLoader.java:75) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.catalina.loader.WebappLoader.createClassLoader(WebappLoader.java:753) at org.apache.catalina.loader.WebappLoader.startInternal(WebappLoader.java:598) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5581) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1016) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:992) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:639) at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1296) at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:2038) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

You can see that Tomcat initializes the ClassLoader instance when it deploys the Web instance, and then focus on why it deploys twice, once each of DeployDirectory and Deploywar. But a look at the official Tomcat documentation shows that there is only one of these two ways to deploy a Web application, but why is this happening twice?

  1. The final screen

There are two app directories in the Tomcat/Webapp directory, and you can delete one of them.