Problem description

The current development project needs to isolate the Spark environment, so customized Tomcat ClassLoader is implemented. When the package actually runs on the server, the application needs to initialize SparkSession, but the following error is reported:

20/04/09 16:57:02 ERROR SparkContext: Error initializing SparkContext. java.lang.ClassCastException: org.apache.spark.serializer.JavaSerializer cannot be cast to org.apache.spark.serializer.Serializer at org.apache.spark.SparkEnv$.create(SparkEnv.scala:295) at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:187) at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257) at org.apache.spark.SparkContext.<init>(SparkContext.scala:424) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2523) at org.apache.spark.sql.SparkSession$Builder? anonfun$7.apply(SparkSession.scala:935) at org.apache.spark.sql.SparkSession$Builder? anonfun$7.apply(SparkSession.scala:926) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926) at org.apache.spark.sql.SparderContext? anonfun$initSpark$1? anon$4.run(SparderContext.scala:128) at java.lang.Thread.run(Thread.java:748)Copy the code

To solve the process

  1. Check the two class-loaded JARS with the JVM-verbose :class parameter to see if there are package conflicts
[Loaded org.apache.spark.serializer.Serializer from File: / root/wangrupeng/spark/jars/spark - core_2. - against 2.4.1 - OS - 11 kylin - r3. Jar] [the Loaded Org. Apache. Spark. Serializer. JavaSerializer the from file: / root/wangrupeng/spark/jars/spark - core_2. - against 2.4.1 - OS - 11 kylin - r3. Jar]Copy the code

It turns out that both classes are loaded from the same JAR package, eliminating the cause of dependency conflicts. 2. Arthas, alibaba’s powerful tool

curl -O https://alibaba.github.io/arthas/arthas-boot.jar
java -jar arthas-boot.jar
#Enter the process ID of the projectSc this # found that we have two instances of custom this sc - d org. Apache. Spark. The serializer. JavaSerializer # no accident, The modified class is loaded twice by two classloader instancesCopy the code

The class cast was loaded by two different classLoaders, resulting in a ClassCastException

  1. Why does a Classloader initialize twice? Since we defined the SparkClassLoader ourselves, I printed the Stack information in its constructor so that I could see the initialization of the class instance
protected SparkClassLoader(ClassLoader parent) throws IOException {
        super(new URL[] {}, parent);
        init();
        Thread.dumpStack();
    }
Copy the code

The final log output is as follows:

java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1336) at org.apache.kylin.spark.classloader.DebugTomcatClassLoader.<init>(DebugTomcatClassLoader.java:75) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.catalina.loader.WebappLoader.createClassLoader(WebappLoader.java:753) at org.apache.catalina.loader.WebappLoader.startInternal(WebappLoader.java:598) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5581) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1016) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:992) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:639) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1127) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:2020) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)  java.lang.Exception: Stack trace at java.lang.Thread.dumpStack(Thread.java:1336) at org.apache.kylin.spark.classloader.DebugTomcatClassLoader.<init>(DebugTomcatClassLoader.java:75) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.catalina.loader.WebappLoader.createClassLoader(WebappLoader.java:753) at org.apache.catalina.loader.WebappLoader.startInternal(WebappLoader.java:598) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5581) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:1016) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:992) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:639) at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1296) at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:2038) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Copy the code

Tomcat initializes a ClassLoader instance when deploying a Web instance, and focuses on why DeployDirectory and DeployWar are deployed twice, However, according to the Official Tomcat document, only one of the two methods is used to deploy a Web application. Why is it used twice?

  1. Finally, it was found that there were two APP directories in the Tomcat/WebApp directory, and it was ok to delete the useless one.