The author The date of The weather
Yuan childe 2020-01-27 (Monday) After rain, the wind is strong in Dongguan

The less you know, the less you don’t know

You can’t level up without your friends’ likes

First, environmental preparation

  • The example uses the Centos7 64-bit operating system
  • Java 1.8 or above
  • Hadoop has been installed
  • Python has been installed
  • The Scala environment has been installed

Download the installation package

Official address: Go to download page

Download the latest software version: spark-2.4.4-bin-without-hadoop.tgz

3. Start installation

Creating an installation directory

[root@hadoop-master /soft]$tar-xvzf spark-2.4.4-bin-without-hadoop. TGZ [root@hadoop-master /soft]# chown-r Hadoop :hadoop spark-2.4.4-bin-without-hadoop [root@hadoop-master /soft]# ln -s spark-2.4.4-bin-without-hadoop sparkCopy the code

To set environment variables, the PYSPARK_DRIVER_PYTHON parameter is used to set the Python environment. For details, see Common Environments

[root@hadoop-master /soft]# vi /etc/profile
export SPARK_HOME=/soft/spark
export SPARK_CONF_DIR=/home/hadoop/spark/conf
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_DRIVER_PYTHON=$ANACONDA_HOME/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_HOME/bin/python
[root@hadoop-master /soft]# source /etc/profile
Copy the code

Create configuration and staging directory folders

[root@hadoop-master /soft]# su - hadoop
[hadoop@hadoop-master /home/hadoop]$ mkdir -p /home/hadoop/spark/conf
[hadoop@hadoop-master /home/hadoop]$ cp -fr /soft/spark/conf/* /home/hadoop/spark/conf/
Copy the code

Modify the configuration file

[hadoop@hadoop-master /home/hadoop]$ cp /home/hadoop/spark/conf/spark-env.sh.template /home/hadoop/spark/conf/spark-env.sh
[hadoop@hadoop-master /home/hadoop]$ cp /home/hadoop/spark/conf/slaves.template /home/hadoop/spark/conf/slaves
[hadoop@hadoop-master /home/hadoop]$ vi /home/hadoop/spark/conf/spark-env.sh
export JAVA_HOME=/soft/jdk
export SCALA_HOME=/soft/scala
export SPARK_HOME=/soft/spark
export SPARK_CONF_DIR=/home/hadoop/spark/conf
export SPARK_LOG_DIR=/home/hadoop/spark/log
export SPARK_MASTER_IP=hadoop-master
export SPARK_WORKER_MEMORY=512m
export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop
export SPARK_DIST_CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath)

[hadoop@hadoop-master /home/hadoop]$ vi /home/hadoop/spark/conf/slaves
hadoop-dn1
hadoop-dn2
hadoop-dn3
Copy the code

Follow section 5 to handle possible exceptions before synchronizing the installation files to child nodes

#Under the hadoop users[hadoop@hadoop-master /soft]$ xrsync.sh /soft/spark ================ dn1 ================== ================ dn2 ================== ================ dn3 ================== [hadoop@hadoop-master /soft]$ xrsync.sh /soft/spark-2.4.4-bin-without-hadoop [hadoop@hadoop-master /soft]$xrsync.sh /home/hadoop/spark#Under the root user
[hadoop@hadoop-master /soft]$ su - root
[root@hadoop-master /root]# xrsync.sh /etc/profile
[root@hadoop-master /root]# xcall.sh source /etc/profile
Copy the code

Ready to start

[hadoop@hadoop-master /home/hadoop]$ run-example SparkPi 10
[hadoop@hadoop-master /home/hadoop]$ spark-shell --master local[2]
scala> :quit[hadoop@hadoop-master /home/hadoop]$pyspark --master local[2] Using Python version 3.6.5 (default, Apr 29 2018 16:14:56) SparkSession available as 'spark'. In [1]: exit [hadoop@hadoop-master /home/hadoo]$ /soft/spark/sbin/start-all.sh org.apache.spark.deploy.master.Master running as process 36750. Stop it first. hadoop-dn3: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/log/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop-dn3.out hadoop-dn2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/log/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop-dn2.out hadoop-dn1: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/log/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoop-dn1.out [hadoop@hadoop-master /home/hadoo]$ jps 36750 Master [hadoop@hadoop-dn1 /home/hadoo]$ jps 5653 Worker
#Added: Boot separately
#/soft/spark/sbin/start-master.sh // Start the master server
#Sh // Start slaves. // Start multiple slave servers
Copy the code

spark web ui

http://hadoop-master:8080
Copy the code

Iv. Service startup

The master node

[hadoop@hadoop-master /home/hadoop]$ su - root [root@hadoop-master /root]# vi /etc/systemd/system/spark-master.service [Unit] Description=spark-master After=syslog.target network.target [Service] Type=forking User=hadoop Group=hadoop ExecStart=/soft/spark/sbin/start-master.sh ExecStop=/soft/spark/sbin/stop-master.sh [Install] WantedBy=multi-user.target Perform save:  Esc :wq [root@hadoop-master /root]# chmod 755 /etc/systemd/system/spark-master.service [root@hadoop-master /root]# systemctl enable spark-master [root@hadoop-master /root]# service spark-master startCopy the code

Slave node

[hadoop@hadoop-dn1 /home/hadoop]$ su - root [root@hadoop-dn1 /root]# vi /etc/systemd/system/spark-slave.service [Unit] Description=spark-slave After=syslog.target network.target [Service] Type=forking User=hadoop Group=hadoop ExecStart=/soft/spark/sbin/start-slave.sh spark://hadoop-master:7077 ExecStop=/soft/spark/sbin/stop-slave.sh [Install] WantedBy=multi-user. Target Save:  Esc :wq [root@hadoop-master /root]# chmod 755 /etc/systemd/system/spark-slave.service [root@hadoop-master /root]# systemctl enable spark-slave [root@hadoop-master /root]# service spark-slave startCopy the code

notebook

[hadoop@hadoop-master /home/hadoop]$ su - root
[root@hadoop-master /root]# vi /etc/init.d/notebook
#! /bin/sh
# chkconfig: 345 85 15
# description: service for notebook
# processname: notebookcase "$1" in start) echo "Starting hive" su - hadoop -c 'export PYSPARK_DRIVER_PYTHON_OPTS="notebook --config=/home/hadoop/.ipython/profile_myserver/ipython_notebook_config.py"; nohup pyspark >/dev/null 2>&1 &' echo "ipython_notebook started" ;; stop) echo "Stopping ipython_notebook" PID_COUNT=`ps aux |grep ipython_notebook |grep -v grep | wc -l` PID=`ps aux |grep  ipython_notebook |grep -v grep | awk {'print $2'}` if [ $PID_COUNT -gt 0 ]; then echo "Try stop ipython_notebook" kill -9 $PID echo "Kill ipython_notebook SUCCESS!" else echo "There is no ipython_notebook!" fi ;; restart) echo "Restarting ipython_notebook" $0 stop $0 start ;; status) PID_COUNT=`ps aux |grep ipython_notebook |grep -v grep | wc -l` if [ $PID_COUNT -gt 0 ]; then echo "ipython_notebook is running" else echo "ipython_notebook is stopped" fi ;; *) echo "Usage: $0 {start | stop | restart | status}" exit 1 esac perform save:  Esc :wq [root@hadoop-master /root]# chmod 755 /etc/init.d/notebook [root@hadoop-master /root]# chkconfig --add notebook  [root@hadoop-master /root]# chkconfig notebook on [root@hadoop-master /root]# service notebook startCopy the code

Five, encounter pit

  • Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/Logger

Jar, slf4J-API-1.7.30. jar, or slf4J-log4j12-1.7.25. jar. The latest version is available.

[hadoop@hadoop-master /home/hadoop]$ll /soft/spark/jars/log* log4J-1.2.17.jar logging-interceptor-3.12.0.jar [hadoop@hadoop-master /home/hadoop]$ll /soft/spark/jars/slf4j-* slf4J-apI-1.7.30.jar SLf4j-log4j12-1.7.25.jarCopy the code
  • JAVA_HOME is not set
[hadoop@hadoop-master /home/hadoop]$ vi /soft/spark/sbin/spark-config.sh
# exportPYSPARK_PYTHONPATH_SET=1 Append the following
  export JAVA_HOME=/soft/jdk
  export SPARK_HOME=/soft/spark
  export HADOOP_HOME=/soft/hadoop
  export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop
  export SPARK_CONF_DIR=/home/hadoop/spark/conf
  export SPARK_LOG_DIR=/home/hadoop/spark/log
Copy the code

Vi. Supplementary content

Use ipython Notebook remotely

[root@hadoop-master /root]# pip install ipython [root@hadoop-master /root]# su - hadoop [hadoop@hadoop-master / home/hadoop] $ipython Python 3.6.5 | Anaconda, Inc. | (default, Apr 29 2018, 16:14:56) Type 'copyright', 'Credits' or 'license' for more information IPython 6.4.0 -- An enhanced Interactive python. Type '? ' for help. In [1]: from IPython.lib import passwd In [2]: passwd() Enter password: Verify password: Out[2]: 'sha1:9435b2964949:cdcf603ca1cf095c5141270b66e9848db30d09f9'#Password: 123456

[hadoop@hadoop-master /home/hadoop]$ ipython profile create myserver
[hadoop@hadoop-master /home/hadoop]$ vi /home/hadoop/.ipython/profile_myserve/ipython_notebook_config.py

c = get_config()
c.IPKernelApp.pylab='inline'
c.NotebookApp.ip='*'
c.NotebookApp.open_browser=False
c.NotebookApp.password=u'sha1:9435b2964949:cdcf603ca1cf095c5141270b66e9848db30d09f9'
c.NotebookApp.port=8888

[hadoop@hadoop-master /home/hadoop]$ PYSPARK_DRIVER_PYTHON_OPTS="notebook --config=/home/hadoop/.ipython/profile_myserver/ipython_notebook_config.py" pyspark
Copy the code

Appendix:

  • Official documents:

    Spark.apache.org/docs/latest…