It’s hard to get into Spark without touching Scala. Scala seems to offer a lot of “functional-like programming” syntactic sugar for Java, so here’s what makes the language unique for readers.

Reference materials include:

  • Spark Big Data Analysis Technology (Scala version) [M]. Beihang University Press, 2021. ISBN: 9787512433854
  • Spark Best Practice [M]. Posts and Telecommunications Press, 2016. ISBN: 9787115422286

Basic Ideas and considerations of Scala

Sacla stands for Scalable Language, as the name suggests, is a Scalable programming Language:

  • Java-based virtual machines (Scala is compiled into JVM bytecode)
  • But it can be used as a script and can build large systems
  • Is a static language, but can support interactive programming just like a dynamic language
  • Face objects: Each value is an object, and each operation is a method call
  • Functional programming: All functions are objects and functions are “first-class citizens”
  • Almost everything in Scala is an expression

Scala is the interpreter, scalac is the compiler; Scala can be scala test.scala or scalac test.scala & Scala test (compile the source code into bytecode, and then put the bytecode into the VIRTUAL machine to interpret and run). You can also enter Scala to enter the switch programming interface.

Note that you need to install the JDK and set up the environment variable JAVA_HOME. And, more importantly, Scala is compatible with smaller versions: 2.12.x and 2.13.x are not compatible, and 2.12.10 and 2.12.11 are compatible.

The most basic syntax example

Type declaration, control structure (for, pattern matching, case)

/ / variable
val two: Int = 1 + 1

var one: Int = 1
var one: String = 'one'

/ / function
def addOne(x: Int) :Int = x + 1

def add(x: Int, y: Int) :Int = {
    x + y
}

// Part control structure
var filename = 
    if(! args.isEmpty) args(0)
    else "default.txt"

for (i <- 1 to 4)
    println("iteration " + i)
Copy the code

1 to 4 is [1,2,3,4], while I until 4 is [1,2,3].

There are some other tricks about for.

// Multiple intervals
for (a <- 1 to 2; b <- 1 to 2) {
    println("a: " + a + ", b: " + b)
}
/ / the result
a: 1, b: 1
a: 1, b: 2
a: 2, b: 1
a: 2, b: 2

/ / filter
val list1 = List(3.5.2.1.7)
for (x <- list1 if x % 2= =1) print("" + x)
// 3
Copy the code

There is more to pattern matching. Here I refer directly to the use of case in Scala

// a simple matching, value matching:

val bools = List(true.false)
for (bool <- bools) {
    bool match {
        case true => println("heads")
        case false => println("tails")
        case _ => println("something other than heads or tails (yikes!) ")}}import scala.util.Random
val randomInt = new Random().nextInt(10)
randomInt match {
    case 7 => println("lucky seven!")
    case otherNumber => println("boo, got boring ol' " + otherNumber)
}

// Type matching

val sundries = List(23."Hello".8.5, 'q')
for (sundry <- sundries) {
    sundry match {
        case i: Int => println("got an Integer: " + i)
        case s: String => println("got a String: " + s)
        case f: Double => println("got a Double: " + f)
        case other => println("got something else: " + other)
}
}

// Match according to order

val willWork = List(1.3.23.90)
val willNotWork = List(4.18.52)
val empty = List(a)for (l <- List(willWork, willNotWork, empty)) {
    l match {
        case List(_, 3, _, _) => println("Four elements, with the 2nd being '3'.")
        case List(_*) => println("Any other list with 0 or more elements.")}}// Use guard array matching in four cases

val tupA = ("Good"."Morning!")
val tupB = ("Guten"."Tag!")
    for (tup <- List(tupA, tupB)) {
        tup match {
            case (thingOne, thingTwo) if thingOne == "Good" =>
            println("A two-tuple starting with 'Good'.")
            case (thingOne, thingTwo) =>println("This has two things: " + thingOne + " and " + thingTwo)
        }
}

// Five object depth matching

case class Person(name: String, age: Int)
val alice = new Person("Alice".25)
val bob = new Person("Bob".32)
val charlie = new Person("Charlie".32)
for (person <- List(alice, bob, charlie)) {
    person match {
        case Person("Alice".25) => println("Hi Alice!")
        case Person("Bob".32) => println("Hi Bob!")
        case Person(name, age) =>
            println("Who are you, " + age + " year-old person named " + name + "?")}}// Six regular expression matches

val BookExtractorRE = """Book: title=([^,]+),\s+authors=(.+)""".r
val MagazineExtractorRE = """Magazine: title=([^,]+),\s+issue=(.+)""".r

val catalog = List(
    "Book: title=Programming Scala, authors=Dean Wampler, Alex Payne"."Magazine: title=The New Yorker, issue=January 2009"."Book: title=War and Peace, authors=Leo Tolstoy"."Magazine: title=The Atlantic, issue=February 2009"."BadData: text=Who put this here??"
)

for (item <- catalog) {
    item match {
        case BookExtractorRE(title, authors) =>
            println("Book \"" + title + "\", written by " + authors)
        case MagazineExtractorRE(title, issue) =>
            println("Magazine \"" + title + "\", issue " + issue)
        case entry => println("Unrecognized entry: " + entry)
    }
}
Copy the code

For case, I want to emphasize its use in “unpacking” :

dict = Map("Piper" -> 95."Bob" -> 90)
dict.foreach {
    case (k, v) => printf(
        "grade of %s is %s/n", k, v
    )
}

grade of Piper is 95
grade of Bob is 90
Copy the code

Foreach {case () => {}} foreach {case () => {}} Equivalent to the following.

dict = Map("Piper" -> 95."Bob" -> 90)
dict.foreach (
    x => println(
        s"grade of ${x._1} is ${x._2}"
    )
)

grade of Piper is 95
grade of Bob is 90
Copy the code

What makes Scala syntax unique

  1. A no-argument method called without parentheses:args.isEmpty.
def width: Int = if (height == 0) 0 else contents(0).length

width  / / call
Copy the code
  1. <- is used in for, the Python equivalent of in.

  2. Extends: class A(A: Int) extends B

  3. Singleton/static member variables and methods are defined in object:

object Timer {
    var count = 0
    def currentCount() : Long = {
        count += 1
        count
    }
}

Timer.currentCount()  // Call directly

class Timer {... }Copy the code
  1. The function return does not have to have a return; the last expression is the default.

  2. Functional: Anonymous functions as arguments, and can be more concise

val numbers = List(1.- 3.- 5.9.0)

numbers.filter((x) => x > 0)
numbers.filter(x => x > 0)
numbers.filter(_ > 0)  // An argument that is used only once in a function
Copy the code
  1. _Having special meaning and work (placeholder)
// Partially apply the function
def adder(m: Int, n: Int) = m + n

val add2 = adder(2, _ :Int)  // add2: (Int) => Int = <function1>
add2(3)  // res1: Int = 5

// Curried rying
def curriedSum(x: Int)(y: Int) = x + y
curriedSum (1) (2)

val onePlus = curriedSum(1) _// Notice the use of _
onePlus(2)

// Pattern matching
var times = 1
times match {
    case 1= >"one"
    case 2= >"two"
    case_ = >"other"
}
Copy the code

Scala’s object-oriented and First-class citizen “functions”

(1+ ().2)  / / 3
Copy the code

As above, (1) is an object and.+(2) is a method call. Everything in Scala is an object.

var increase = (x: Int) => x + 1
Copy the code

As above, functions are first-class citizens and can be assigned to variables.

Basic data structure

There are the following concepts:

  • Immutable listListAnd mutable listListBuffer
  • Fixed-length arrayArrayAnd variable length arraysArrayBuffer
  • Immutable setSetAnd mutable setscala.collection.mutable.Set
  • mappingMapAnd variable mappingscala.collection.mutable.Map
  • tuplesTuple

Notes and Scala’s quirks

  1. useuntilIt’s a good way to iterate over a set of numbers,by_ *Special meaning:
for (i <- 0 until.length) { }

Array (1.3.5.7.9.11)  / / equivalent to the
Array[Int] (1 to 11 by 2: _ *)// _* unpack
Copy the code
  1. useyieldTo generate an array
val a = Array(1.2.3.4)
val res1 = for (ele <- a) yield 2 * ele
// 2, 4, 6, 8
Copy the code
  1. The subscript of a tuple is from1start
val person = (1.2."ABC")
person._1  / / 1
Copy the code
  1. Zipper operatingzip
val symbols = Array("<"."-".">")
val counts = Array(2.10.2)
val pairs = symbols.zip(counts)
// Array[(String, Int)] = Array((<, 2), (-, 10), (>, 2))
for ((s, n) <- pairs) print(s * n)
<<---------->>
Copy the code
  1. MapMagic operation
/ / create
val dict = Map("Piper" -> 95."Bob" -> 90)
val kv   = Map(("Piper".95), ("Bob".90))

/ / value
dict("Piper")

/ / merge + +
dict ++ kv
dict.++(kv)

// Add +, delete -
val n = dict + ("Tom" -> 91)
val l = dict - "Tom"
Copy the code

For mutable Map:

/ / + = - =
dict += (("Tom".91), ("Jerry".87))
dict -= "Tom"
dict -= ("Jerry"."Bob")

// ++= --= is associated with other sets
dict ++= List(("Tom".91), ("Jerry".87))
dict --= List("Jerry"."Bob")
Copy the code
  1. : :: : :Create a list of
1: :3: :5: :Nil  // List[Int] = List(1, 3, 5)
Copy the code

Note :: is right-associative :(1::(3::(5::Nil)).

// ::: used to join lists
val L4 = L3: : :List("Hadoop"."Hbase")
Copy the code

Discussion of data structures (List or Array?)

  • Use lists instead of arrays
  • The structure of lists is recursive (that is, linked lists,linkedList), and arrays are equal

Reference:

  • Scala distinguishes List, Array, ListBuffer, ArrayList, Set, and tuple
  • Scala Learning Note 5 (Collection Collections)