NSpark Shell is a powerful interactive data analysis tool provided by Spark that we use directly$SPARK_HOME/bin/spark-shellUse the Spark Shell command. If you are in the bin directory, use it directlyspark-shell.

Once in, you can see that SC and Spark have been initialized.


The Spark Shell also supports other parameters, such as master, executor-memory, and more. $SPARK_HOME/bin/spark-shell –help


The master values include spark, mesos, yarn, k8s, and local. The default is local, as you can see from the figure above.

model describe format
spark Run on Spark’s Stanalone cluster, the same highly available cluster that we built in the previous article spark://host:port
mesos Run on the MESOS Explorer mesos://host:port
yarn Run on the YARN resource manager
k8s Running on a K8S cluster https://host:port
local Local mode, run locally Local: 1 thread

Local [*] : unlimited thread

Local [K] : K threads

Local [K,F] : local[K,F] : local[K,F] : local[K,F]

Local [*,F] : no threads but maximum F threads

For example, if we are running on our Spark cluster, we can see that the master after the SC is our Spark cluster.

[bigdata@bigdata01 test]$ $SPARK_HOME/bin/spark-shell \
> --master spark://bigdata01:7077,bigdata03:7077


  • Executor-memory: Executor memory, default 1GB
  • Total -executor-cores: The standard mode parameter that defines the number of CPUs for all executors.

The instance

Read a file from HDFS and count the number of lines in the file, the contents of the first line, and so on. File test.txt Contents:


Uploaded to the HDFS

[bigdata@bigdata01 test]$ hadoop fs -put /home/bigdata/test/test.txt  /dajun/test

Enter the spark shell

[bigdata@bigdata01 test]$ $SPARK_HOME/bin/spark-shell --master spark://bigdata01:7077,bigdata03:7077 --executor-memory 512M --total-executor-cores 2

Operate in the Spark Shell

TextFile = spark.read.textFile("/dajun/test/test.txt") Org. Apache. Spark. SQL. The Dataset [String] = [value: String] # scala statistical quantity > textFile. The count () res0: Select * from scala> textfile.first () res1; Filter (line => line. Contains (" A ")). Count () Res2: Long = 1

You can also use the cache, textfile.cache (), so that the contents of the cache will be read on the second execution of textfile.count ().