Docker part

  • Continue to learn common Docker operations, such as: mapping ports, mounting directories, transferring variables, etc.

  • Continue in-depth study of Dockerfile, familiar with ARG,ENV,RUN,WORKDIR,CMD and other directives;

Scala part

  • ScalaBasic grammar,ScalaWrite the first oneSparkApplication program;
  • SBTPackage the application with a configuration manifest.

The Spark part

  • Submit writtenScalaApplications,--classThe main class.

configurationScalaRuntime environment

The installation

Spark is written in Scala, so I use Scala to write programs here.

Continue writing Dockerfile based on the openJDK image mentioned in the previous article:

# Scala and sbt Dockerfile
# (based on

# Pull base image
FROM  openjdk:8-alpine



  echo "$SCALA_VERSION $SBT_VERSION"&& \ mkdir -p /usr/lib/jvm/java-1.8 -openJDK /jre && \ touch /usr/lib/jvm/java-1.8 -openJDK /jre/release && \ apk add --no-cache bash && \ apk add --no-cache curl && \ curl -fsL$SCALA_VERSION/scala-$SCALA_VERSION.tgz | tar xfz - -C /usr/local && \
  ln -s /usr/local/scala-$SCALA_VERSION/bin/* /usr/local/bin/ && \
  scala -version && \
  scalac -version

  curl -fsL$SBT_VERSION/sbt-$SBT_VERSION.tgz | tar xfz - -C /usr/local && \
  $(mv /usr/local/sbt-launcher-packaging-$SBT_VERSION /usr/local/sbt || true) \
  ln -s /usr/local/sbt/bin/* /usr/local/bin/ && \
  sbt sbt-version || sbt sbtVersion || true

WORKDIR /project

CMD "/usr/local/bin/sbt"
Note that the two arguments at the beginning of Dockerfile, SCALA_VERSION and SBT_VERSION, are user-specified.

Then compile the Dockerfile:

# Notice the last "." -- the current directoryDocker build-t Vinci/Scala-sbt :latest \ --build-arg SCALA_VERSION=2.12.8 \ --build-arg SBT_VERSION=1.2.7 \Copy the code

It may take a while please be patient


Create a new temporary interactive container to test:

docker run -it --rm vinci/scala-sbt:latest /bin/bash
Enter scala-version and SBT sbtVersion in sequence

If the following information is displayed in the container, the installation is successful.

Bash - 4.4 -# scala -version
Scala code runner version 2.12.8 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
bash-4.4# sbt sbtVersion
[warn] No sbt.version set in project/, base directory: /local
[info] Set current project to local (in build file:/local/ [info] 1.2.7Copy the code

Mount local files

In order for us to be able to access our local files, we need to install a volume from our working directory to a location on the running container.

We simply add the -v option to the run directive, as follows:

mkdir -p /root/docker/projects/MyFirstScalaSpark
cd /root/docker/projects/MyFirstScalaSpark
docker run -it --rm -v `pwd`:/project vinci/scala-sbt:latest
  1. pwdRefers to the current directory (Linux virtual machine: / root/docker/projects/MyFirstScalaSpark);
  2. /projectIs mapped to the directory inside the container;
  3. Don’t use/bin/bash, you can directly log in toSBTThe console.

If you look closely at the previous Dockerfile configuration, the last line specifies the default command to execute, and the penultimate line specifies the working directory

After a successful login, the following message is returned:

[root@localhost project]# docker run -it --rm -v `pwd`:/project vinci/scala-sbt:latest
[warn] No sbt.version set in project/, base directory: /local
[info] Set current project to local (in build file:/local/)
[info] sbt server started at local: / / / root/SBT / 1.0 / server / 05 a53a1ec23bec1479e9 / sock SBT:local>
The first program

Configure the environment

Now it’s time to start writing your first Spark application.

But you can also see [WARN] in the output from the previous section because the SBT version is not set, which is a problem with the configuration file.

Under the project directory we created, we will create build. SBT

name : = "MyFirstScalaSpark"
version : = "0.1.0 from"
scalaVersion : = "2.11.12"
libraryDependencies + = "org.apache.spark" % % "spark-sql" % "2.4.0"
This gives us a minimal project definition.

Note: We have specified the Scala version as 2.11.12 because Spark is compiled for Scala 2.11, but the Scala version on the container is 2.12. In the SBT console, run the reload command to refresh the SBT project with the new build Settings:

Write the code

Create a SSH connection to CentOS:

Create a directory:

mkdir -p /root/docker/projects/MyFirstScalaSpark/src/main/scala/com/example
cd /root/docker/projects/MyFirstScalaSpark/src/main/scala/com/example
vim MyFirstScalaSpark.scala
As follows:

package com.example
import org.apache.spark.sql.SparkSession
object MyFirstScalaSpark {
  def main(args: Array[String]) {
    val SPARK_HOME = sys.env("SPARK_HOME")
    val logFile = s"${SPARK_HOME}/"
    val spark = SparkSession.builder
    val logData =
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
Enter into the SBT container and enter

After waiting for a long time, the following interface is displayed, indicating that the package is successfully packaged:

Submit a task

Packaged jar package: / root/docker/projects/MyFirstScalaSpark/target/scala – 2.11 directory

Start theSparkClusters (see Chapter 1) :

cd /root/docker/spark
docker-compose up --scale spark-worker=2
Start theSparkClient container

cd /root/docker/projects/MyFirstScalaSpark
docker run --rm -it -e SPARK_MASTER="spark://spark-master:7077" \
  -v `pwd`:/project --network spark_spark-network \
  vinci/spark:latest /bin/bash
Submit a task

Go to the Spark client container and enter the following statements:

spark-submit --master $SPARK_MASTER\ - class com. Example. MyFirstScalaSpark \ / project/target/scala - 2.11 / myfirstscalaspark_2. 11-0.1.0 from. The jarCopy the code

Result output:

Lines with a: 62, Lines with b: 31

The operation succeeds.

This concludes the chapter.