Recently, when EXECUTING the Spark engine (set hive.execution.engine= Spark) in Hive, I often encountered an error message with return code 30041. To find out the cause, I read the official issue and related blogs. The following are introduced from several aspects of error phenomenon, cause analysis and solution, and finally make a summary.

(1) Error reporting phenomenon

ERROR : FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 7a817eea-176c-46ba-910e-4eed89d4eb4d
Copy the code

The above error message occurs frequently when I use Hive On Spark. However, after each error, I can run the HQL script again, which makes me confused.

(2) Cause analysis

Bloggers cite several reasons:

1. Spark is not started properly

2. Spark and Hive versions do not match

3. Resources are insufficient, causing the Connection between the Hive and Spark client to exceed the specified duration

Symptom 1 and 2 do not belong to the symptom 1 and 2. The Spark job on the cluster runs normally, indicating that Spark is started normally. In addition, the jobs configured with Hive On Spark do not fail permanently but fail occasionally. If the jobs are executed successfully after a retry, Spark and Hive versions match and there is no version conflict problem.

(iii) Solutions

Re: Cant Use Hive on Spark Engine Cannot Create Client Erorr Code 30041

Choose Hive Service > Configuration to enable Spark as a dependency. 2. Choose Hive Service > Configuration to view Spark related Settings

Second, if the above conditions are met, according to issues.apache.org/jira/browse… Answered by ZhouDaHong, middle.

If the hive and Spark configurations and version conflicts are excluded, perform the following operations: You can view queue resources.

If the queue resources reach 100% and no idle task resources are released to create a Spark session within a short time, the task will fail and this exception will be raised.

Solution: Increase the connection interval of the Hive client to 5 minutes.

Set the hive. Spark. Client. The server. The connect. Timeout = 300000;

The unit is millisecond. The default value is 90 seconds. It is not recommended to set this parameter too long, as this will affect the job runtime.

(4) Summary

The possible causes listed on the Internet cannot be nested. The problem I encountered is either 1. Spark does not start properly or 2. Spark does not match the Hive version. So think about it from the third Angle. After finding the right direction, refer to relevant documents, forum discussions, start to try, step by step toward the goal.

The author blog: www.cnblogs.com/JasonCeng/

reference

[1] cant use hive on spark engine always create a client erorr code 30041 [community.cloudera.com/t5/Support-…].

[2]SemanticException Failed to get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create the Spark client for Spark session [issues.apache.org/jira/browse…].

[3] 0514 – Hive On Spark unable to create the Spark Client problem analysis [cloud.tencent.com/developer/a…].

[4] HiveonSpark failed to create sparkSession [blog.sina.com.cn/s/blog_6dd7…]

[5]Failed to create Spark client for Spark session[www.cnblogs.com/Sunqi-1997/…]

[6]FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 7f00ef43-d843-45f5-a425-0995818a608f_0: java.lang.RuntimeException: spark-submit process failed with exit code 1[www.itdiandi.net/view/1431]

[7] Hive on Spark [issues.apache.org/jira/browse…].