Detailed procedure for installing Spark

Spark is capable of creating distributed elastic data sets from any file stored in HDFS or HadAPI(Hive, HBase, etc.).

The Spark scheduling modes are as follows:

  • Yarn-cluster applies to production scenarios
  • Yarn-client requires interaction
  • Local Small-scale local operation

The combination of Spark and Hadoop provides the power of enterprise applications with memory-level performance.

Preparations Name Version JDK 1.8.0 Hadoop 2.7.2 Scala 2.11.6 www.scala-lang.org/download/2…. The spark 2.2.2 https://mirrors.cnnic.cn/apache/spark/spark-2.2.2/

To run Spark, you need to configure the Scala JDK. In yarn-Cluster mode, Spark uses YARN as its resource broker to allocate computing tasks and use resources on nodes.

The first step:

Will spark -2.22.-bin-hadoop27.. TGZ Upload to /homeCopy the code

The second step:

Decompress tar -xvf spark-2.22.-bin-hadoop27.. TGZ Rename: MV Spark -2.22.-bin-hadoop27..tgz  spark
Copy the code

Step 3: Configure the environment variable vim /etc/profile

export SPARK_HOME=/home/spark
export PATH=$PATH:$SPARK_HOME/bin
Copy the code

Execution changes:

source /etc/profile
Copy the code

Step 4: Edit conf/spark-env.sh


export JAVA_HOME=/usr/local/jdk18.
export SCALA_HOME=/usr/scala
export HADOOP_HOME=/opt/hadoop/hadoop27.. 5
export HADOOP_CONF_DIR=/opt/hadoop/hadoop27.. 5/etc/hadoop
export SPARK_MASTER_IP=7master
export SPARK_WORKER_MEMORY=8g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
export SPARK_HOME=/home/spark
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=7master:2181,node1:2181,node2:2181 -Dspark.deploy.zookeeper.dir=/spark"
Copy the code

Step 5: Modify Slaves:

7master
node1
node2
Copy the code

Step 6: Distribute Spark to the corresponding location on each node:

 scp -r spark root@node1:/home/
 scp -r spark root@node2:/home/
Copy the code

Step 7 Start the cluster:

start-all.sh
Copy the code

Step 8 Use test cluster project reference: blog.csdn.net/weixin_4099…

/bin spark-sumbit --master yarn-cluster com.csu.Credit Bank.jar
Copy the code