The preparatory work

Download address


The download pageThere is aSpark release archivesClick to view the historical version

So what I’ve chosen here isThe spark – 2.4.5, because I choose version 2.7.7 of Hadoop, so download isThe spark – 2.4.5 – bin – hadoop2.7 TGZ.

Environment depends on

The Spark – 2.4.5The dependent Java and Scala versions are distributed between Java8 and Scala 2.12. The installation process is brief.

My native version:

Cluster planning

role node
Master Bigdata01, bigdata03
Salve Bigdata01, Bigdata02, Bigdata03, Bigdata04
Spark HistoryServer bigdata02

Distributed high availability installation

Upload the installation package and unzip it to the corresponding directory:

Tar-zxvf spark-2.4.5-bin-hadoop2.2.7.tgz-c /home/bigdata/apps/

Modify the configuration file

[bigdata@bigdata01 apps]$CD /home/bigdata/apps/spark-2.4.5-bin-hadoop2.7/conf [host conf]$ll total 36 -rw-r--r-- 1 bigdata bigdata 996 Feb 3 2020 -rw-r--r-- 1 bigdata bigdata 1105 Feb 3 2020 fairscheduler.xml.template -rw-r--r-- 1 bigdata bigdata 2025 Feb 3 2020 -rw-r--r-- 1 bigdata bigdata 7801 Feb 3 2020 -rw-r--r-- 1 bigdata bigdata 865 Feb 3 2020 slaves.template -rw-r--r-- 1 bigdata bigdata 1292 Feb 3 2020 spark-defaults.conf.template -rwxr-xr-x 1 bigdata bigdata 4221 Feb 3 2020 [bigdata@bigdata01 conf]$ mv [bigdata@bigdata01 conf]$ vim

Add the following after

Export JAVA_HOME=/usr/local/ Java /jdk1.8.0_73 export spark_master =7077 SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bigdata01,bigdata02,bigdata03  -Dspark.deploy.zookeeper.dir=/spark"

Configure slave information:

[bigdata@bigdata01 conf]$ mv slaves.template slaves
[bigdata@bigdata01 conf]$ vim slaves

Add the following after slave:


Copy the HDFS configuration file to the conf directory here:

[bigdata @ bigdata01 conf] $CD/home/bigdata/apps/hadoop - 2.7.7 / etc/hadoop [bigdata @ bigdata01 hadoop] $cp core - site. The XML HDFS - site. XML/home/bigdata/apps/spark - 2.4.5 - bin - hadoop2.7 / conf

Spark 2.4.5-bin-Hadoop 2.7 to BigData02, BigData03, BigData04:

[bigdata@bigdata01 hadoop]$CD /home/bigdata/apps [app]$scp-r spark-2.4.5-bin-hadoop2.7/ Bigdata02 :$PWD [bigdata@bigdata01 apps]$scp-r spark-2.4.5-bin-hadoop2.7/ bigdata03:$PWD [app]$SCP - r spark - 2.4.5 - bin - hadoop2.7 / bigdata04: $PWD

Configure environment variables:

[bigdata@bigdata01 apps]$ vim ~/.bashrc

Add Spark configuration information after the content:

Export SPARK_HOME = / home/bigdata/apps/spark - 2.4.5 - bin - hadoop2.7 export PATH = $PATH: $SPARK_HOME/bin: $SPARK_HOME/sbin

Configuration in effect:

[bigdata@bigdata01 apps]$ source ~/.bashrc

Bigdata02, Bigdata03, Bigdata04 servers also need to be configured as above. Start the Spark cluster. Since we plan that bigdata01 and bigdata03 are masters, we will execute them on these two servers respectively:

$$SPARK_HOME/sbin/ # bigdata03 [SHS ~]$ $SPARK_HOME/sbin/

JPS shows that Bigdata01 and Bigdata03 have Master and Worker nodes, while Bigdata02 and Bigdata04 have only Worker nodes.

View the Web page:

Bigdata01: http://bigdata01:8080/, you can see 4 workers, and no task running at this time.

Bigdata03: http://bigdata03:8080/, you can see that there is no worker information here, his Status value is STANDBY

The Spark HistoryServer installation

Copy a spark – defaults. Conf

[bigdata@bigdata02 conf]$CD /home/bigdata/apps/spark-2.4.5-bin-hadoop2.7/conf/ [hostconf]$cp spark-defaults.conf.template spark-defaults.conf [bigdata@bigdata02 conf]$ vim spark-defaults.conf


spark.eventLog.enabled true 
spark.eventLog.dir hdfs://hadoopdajun/spark_historylog

Edit the file again

[bigdata@bigdata02 conf]$ vim

Add the following to configure the log path and Web access port:

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=30 -Dspark.history.fs.logDirectory=hdfs://hadoopdajun/spark_historylog"

Add the above path to Hadoop:

[bigdata@bigdata02 conf]$ hadoop fs -mkdir -p hdfs://hadoopdajun/spark_historylog

Start Spark HistoryServer and look at the JPS to see that there is already one more HistoryServer

[bigdata@bigdata02 conf]$ $SPARK_HOME/sbin/ starting org.apache.spark.deploy.history.HistoryServer, logging to / home/bigdata/apps/spark - 2.4.5 - bin - hadoop2.7 / logs/spark - bigdata - org. Apache. Spark. Deploy. History. - bigdata0 HistoryServer - 1 2.out [bigdata@bigdata02 conf]$ jps 2368 QuorumPeerMain 4465 Jps 1699 JournalNode 4435 HistoryServer 3928 Worker 1529 NameNode 1596 DataNode

Accessible via http://bigdata02:18080/, the page is as follows: