1 / download

To the official website to download the apache spark's official website: https://spark.apache.org/downloads.html or is mirror, tsinghua university library: https://mirrors.tuna.tsinghua.edu.cn/Copy the code

2/ Upload the file to the Linux server from the local PC

Run the rz spark-3.1.1-bin-hadoop3.2. TGZ commandCopy the code

3 / uncompress

Tar -zxvf spark-3.1.1-bin-hadoop3.2. TGZ Generates a spark-3.1.1-bin-hadoop3.2 directoryCopy the code

4/ Set environment variables

In the.bashrc file, write (based on your own situation, Export SPARK_HOME=/home/hadoop/spark-3.1.1-bin-hadoop3.2 export PATH=$PATH:$SPARK_HOME/bin export PYTHONPATH = $SPARK_HOME/python: $SPARK_HOME/python/lib/py4j - 0.10.4 - SRC. Zip: $PYTHONPATH export PATH=$SPARK_HOME/python:$PATHCopy the code

5/ Make environment variables take effect immediately

source .bashrc 
 
 
Copy the code

6 / start pyspark

Go to the installation directory spark-3.1.1-bin-hadoop3.2/bin/ under./pysparkCopy the code

# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

1. Install Spark

1 / download

The official download address: spark.apache.org/downloads.h... , select The Spark version and Hadoop version and download:Copy the code

2/ Decompress the installation package:

# tar - ZXVF spark - then - bin - hadoop2.6. TGZCopy the code

3/ Configure environment variables

Vim /etc/profile export SPARK_HOME=/home/hadoop/spark-2.2.3-bin-hadoop2.6 export PATH=${SPARK_HOME}/bin:$PATH source /etc/profileCopy the code

2 / Start Spark

Local mode is the simplest running mode. It adopts single-node multi-threading mode to run without deployment and out of the box, which is suitable for daily test and development. Spark-shell --master local[2] local: start only one worker thread; Local [k] : start k worker threads; Local [*] : starts the same number of worker threads as the number of cpus. This is the screen after successfully starting Spark, and you can see the version of Spark. Scala is a programming language, and spark is the default programming language, although it is possible to start Spark in Python.Copy the code