Separation of concerns

There is a famous dramatic theory of Leonid Chekov: If you are not going to fire, don’t let a loaded rifle appear.

Zhihu has a brief summary of lazy loading and Stream, please refer to: Scala Functional Programming lazy loading and Stream – Zhihu (zhihu.com)

FP is “immutable” by nature, meaning that a new copy of the data must be created for each transformation, so immutable sequence methods such as map, filter, foldLeft, foldRight, etc. always construct a new sequence. When multiple transformations are connected, the output of each transformation is passed as input to the next operation, after which it is immediately discarded.

val l = List(1.2.3.4.5).map(_ + 1)    / / the List (2,3,4,5,6)
  .filter(_ > 2)                          / / the List (3,4,5,6)
  .map(_ * 3)                             / / the List (9,12,15,18)
println(l.mkString(",")) 
Copy the code

Obviously, the immutability of functional programming comes at the expense of space. One way to avoid intermediate results is to rewrite the code using either a for or a while loop, with each iteration of an element performing all the transformations:

val l = List(1.2.3.4.5)
//
// Since Scala doesn't have a traditional for loop, the bottom layer is still flatMap, map, fiterWith.
val r = for i <- l if i + 1 > 2 yield (i + 1) * 3
println(r.mkString(","))
Copy the code

Ideally, we don’t have to do this manually and still retain the programming style of high-order composition, so this chapter introduces lazy evaluation (formally known as non-strictness evaluation) to solve the problem.

Scala already provides solutions for lazy computing, such as the lazy keyword (underlying is a thread-safe lazy singleton that can only be used for an immutable variable val), Stream, and View. They are designed to follow a theme of separation of concerns — specifically, the separation between the declaration of a computation and its actual execution, in which the program performs the necessary evaluation only when it needs to, not necessarily at the moment the data is declared. This has two benefits:

  1. Avoid potentially useless but resource-intensive calculations.
  2. Avoid lots of intermediate results.

After Scala version 2.13, streams have been marked as obsolete, and LazyList is officially recommended as an alternative. . See: Scala Standard Library 2.13.7 Scala collections. Immutable. LazyList (scala-lang.org)

// LazyList(<not computed>)
LazyList(1.2.3.4).map(_ + 1).filter(_ > 2).map(_ * 3)

/ / 9,12,15,18
println(LazyList(1.2.3.4.5).map(_ + 1).filter(_ > 2).map(_ * 3).mkString(","))
Copy the code

See my old notes on flows and views: Scala + : Type Inference, list operations and for Loop-digging (juejin. Cn).

Non-strict evaluation (lazy evaluation)

** Non-strictly evaluated ** is a property of functions. To call a function non-strictly evaluated means that the function can choose not to evaluate all arguments passed in. In Scala, this is strictly evaluated by default. For example, the following program execution throws an exception because the IllegalArgumentException at argument B is created and thrown first before the div function is called.

def div(a : Double, b : Double) :Double = a / b
div(1.throw new IllegalArgumentException("crash this."))
Copy the code

In many programming languages (including Scala), a typical example of the strict evaluation is: | | and &&. These two operations may only need to accept the first parameter to get the exact result, so they are also known as short-circuit operations. Another example of non-strict evaluation is the if conditional branch, such as the ternary operator: con? A: B, obviously, when con is true, the B branch is discarded and vice versa.

We’ve already seen how to write non-strict evaluation functions in Scala: call by name:

def div(a : => Double, b : => Double) :Option[Double] = try Some(a / b) catch case_ = >None
div(1.throw new IllegalArgumentException("crash this."))
Copy the code

Because a and B are delayed until after div is called, the try-catch block catches the exception and returns None, which is the basis for understanding what follows. Another equivalent is to pass in an empty parenthesis function of the form () = > A (which is really Function0[A]), but the call is A bit more cumbersome to write:

def div(a : () => Double, b : () => Double) :Option[Double] = try Some(a() / b()) catch case_ = >None
div(()=>1, () = >throw new IllegalArgumentException("crash this."))
Copy the code

Either way, such unevaluated blocks are called thunk. A () and b(), also called delayed evaluation (called forced evaluation in the original book), represent the moment that thunk is actually evaluated.

Inert list

Here is an example of how Scala can use lazy operations to improve efficiency by self-implementing a lazy list or Stream. Firstly, the basic definition is given:

enum _Stream[+A] :case Empty extends _Stream[Nothing]
  // Scala specifies that attributes cannot define named parameters, so empty parentheses are used instead.
  case Cons[+A](head: () => A, tail: () => _Stream[A]) extends _Stream[A]

  import _Stream.*
  def headOption: Option[A] = this match
    case Empty= >None
    case Cons(h,t) => Some(h())

object _Stream:
  def cons[A](hd: => A, tl: => _Stream[A]): _Stream[A] =
    lazy val h: A = hd
    lazy val t: _Stream[A] = tl
    Cons(() => h, () => t)

  def empty[A] : _Stream[A] = Empty
  // Scala 2: as.tail : _*
  // Scala 3: as.tail*
  // create a stream like _Stream(1,2,3,4,5).
  def apply[A] (as : A*) : _Stream[A] = if as.isEmpty then empty else cons(as.head,apply(as.tail*))
Copy the code

Due to compiler technology limitations, the attributes of Cons[+A] cannot be defined as named arguments, so empty parenthesis functions are replaced here. As mentioned above, the empty parenthesis function declaration can affect the user experience, so we also define additional lower-case “pseudo-constructors” of the same name, such as cons, empty, etc., to avoid the redundant ()=>… Writing.

// Using "fake constructors" can simplify the user experience.
//  val y = cons({println("init y(0)");10},empty)
val x = Cons(()=>{println("init x(0)");10},()=>empty)
Copy the code

Another reason to adopt “pseudo-constructors” is that using Cons(head,tail) constructors directly, while delaying loading, does not cache thunk calculations. To prove this, insert a few side effects when constructing Cons, and then call the headOption method several times. You can see that the console prints multiple lines of init x(0), which indicates that the passing call has been evaluated repeatedly.

// Lazy loading, declaring x without printing any information.
val x = Cons(()=>{println("init x(0)");10},()=>empty) 
val h1 = x.headOption		// init x
val h2 = x.headOption		// init x 
Copy the code

The Cons data constructor uses two lazy variables h and t internally. In HD, the result is cached after the TL is calculated for the first time. No matter how many times headOption is called, the console just prints a line of init y(0).

// Lazy loading, declaring y without printing any information.
val y = cons({println("init y(0)");10},empty)

val h1 = y.headOption	// init y
val h2 = y.headOption	// The result saved with lazy h is not printed.
Copy the code

Note: The creation of a pseudo-constructor like this, which is really just a generic method, is a common programming technique in Scala.

Method implementation

Start by trying other methods commonly used in generic sequences: toList, take, drop, all of which can be implemented using a combination of recursion and pattern matching, since _Stream itself is defined recursively.

def toList : List[A] =  this match
  case Empty= >Nil
  case Cons(h,t) => h() :: t().toList

def take(n : Int) : _Stream[A] = this match
  case Cons(h,t) if n > 0 => cons(h(),t().take(n- 1))
  case _ => empty

def drop(n : Int) : _Stream[A] = this match
  case Cons(_,t) if n > 0 => t().drop(n- 1)
  case_ = >this

def takeWhile(p : A= >Boolean) : _Stream[A] = this match
  case Cons(h,t) if p(h()) => cons(h(),t().takeWhile(p))
  case Cons(_,t) => t().takeWhile(p)
  case _ => empty

def forAll(p : A= >Boolean) : Boolean = this match
  case Cons(h,t) if! p(h()) =>false
  case Cons(_,t) => t().forAll(p)
  case_ = >true
Copy the code

Note that only h(), t() represent the actual evaluation. If the judgment is short-circuited, that means that the subsequent t will not be computed at all. ForAll the two branches of a case of short-circuit operator can use && and | | combined into a more compact judgment logic. Similarly, the implementation of EXISTS is given:

def forAll0(p : A= >Boolean) : Boolean = this match
  case Cons(h,t) => p(h()) && t().forAll0(p)
  case_ = >true

def exists(p : A= >Boolean) : Boolean = this match
  case Cons(h,t) => p(h()) || t().exists(p)
  case_ = >false
Copy the code

The logic of takeWhile, forAll and EXISTS are highly overlaps and can be refined into a more pure foldRight method. It is worth noting that the f function is also non-strictly evaluated on the second argument. If f doesn’t evaluate it, then you’re essentially terminating the recursion.

// z is the initial value.
def foldRight[B](z : => B)(f : (A= >,B) = >B) : B = this match {
  case Cons(h,t) => f(h(),t().foldRight(z)(f))
  case _ => z
}
Copy the code

FoldRight is very basic abstract logic because it can further reproduce map, filter, append methods:

def exists0(p : A= >Boolean) : Boolean = foldRight(false)((a,b) => p(a) || b)
def takeWhile1(p :A= >Boolean) : _Stream[A] = foldRight(empty){
  (h,t)=> if(p(h)) cons(h,t) else t
}

def map[B]( f : A= >B) : _Stream[B] = foldRight(empty){
  (h,t)=> cons(f(h),t)
}

def filter(f : A= >Boolean) : _Stream[A] = foldRight(empty){
  (h,t) => if f(h) then cons(h,t) else t
}

def append[B> :A](other : _Stream[B] ) : _Stream[B] = foldRight(other){cons(_,_)}
Copy the code

FlatMap can be combined with foldRight and append methods:

def flatMap[B](f : A => _Stream[B]) : _Stream[B] = foldRight(empty){
  (h,t) => f(h).append(t)
}
Copy the code

_Stream[+A] that implements both flatMap and map can now apply For expressions. Such as:

val value: _Stream[Int] = for {subStream <- s ; i <- subStream} yield i
Copy the code

By now, we should understand the concept of “modular programming” in the FP paradigm. Scala, on the other hand, provides For expressions that allow the user to declare logic imperatively, but actually execute it recursively — that’s the point of focusing on separation.

All of the above method implementations are incremental — they don’t generate all the answers at once until the desired elements are “observed” externally. Because of this incremental nature, we can call functions one after another without instantiating intermediate results, which is the basis for co-recursion and building infinite flows.

Why Stream is lazy

Tracing how a Stream calculation executes a filter and a map alternately can explain why the various transformations of Stream use only the necessary space and do not create a large number of intermediate results.

Makes the computation equation of reasoning ability of meromorphic functions, so here can be safely replace with the return value of function calls, makes analytical steps look like to take off the algorithm (see: Scala functional data structure and the art of recursive reference – the nuggets (juejin. Cn) | transparent part). We didn’t use any explicit for, while loops at all, but we implemented recursive logic that was equivalent to iteration. For this reason, a Stream is also described as a first-class loop.

// return List(12,14)
println(cons(1, cons(2, cons(3, cons(4, empty)))).map(_ + 10).filter(_ % 2= =0).toList)

println(cons(11, cons(2, cons(3, cons(4, empty))).map(_ + 10)).filter(_ % 2= =0).toList)

println(cons(12, cons(3, cons(4, empty)).map(_ + 10)).filter(_ % 2= =0).toList)

println(12 :: cons(13, cons(4, empty).map(_ + 10)).filter(_ % 2= =0).toList)

// According to map's definition, empty is returned by default if a match is not Cons.
// empty. Map ((a: Int) => a + 10) === empty
println(12 :: cons(14, empty.map((a: Int) => a + 10)).filter(_ % 2= =0).toList)

println(12: :14 :: empty.toList)

println(12: :14: :Nil)
Copy the code

The incremental nature of lazy lists has important implications for memory usage. For example, in this example, data 11 and 13 are filtered out during the calculation. In the case of large elements or large objects, reclaiming their memory in a timely manner can reduce the resource footprint of the program.

The streaming Calculation for improving memory efficiency, especially the I/O calculation, will be mentioned in the following chapters.

Infinite flow and corecursion

An infinite stream can be created using the properties of increments. The concept of infinite flow was also mentioned in Java 8, as a continuous faucet that can be turned on to continuously produce data. Here is an example that can generate an infinite number of 1s:

// Declare it in object _Stream.
val ones : _Stream[Int] = cons(1,ones)
// According to the uniform access principle, it does not matter whether it is defined by val or def.
// def ones1 : _Stream[Int] = cons(1,ones1)
Copy the code

Be careful with infinite streams because they can throw stack overflow exceptions.

println(ones)     // ok
println(ones.take(5).toList)    // ok
println(ones.toList)  // err : StackOverflowError
Copy the code

A slight generalization of the ones function yields the infinite stream generator constant with the given value:

def constant[A](i : A) : _Stream[A] = cons(i,constant(i))
// println(constant("java").take(5).toList)
Copy the code

To further discover the pattern, let’s implement two more infinite streams: one that generates _Stream(n, n+1…). From method and fib method generating Fibonacci sequence infinite stream:

def constant[A](i : A) : _Stream[A] = cons(i,constant(i))

def from(i : Int) :  _Stream[Int] = cons(i,constant(i + 1))

def fib(using init :(Int.Int) = (0.1)) : _Stream[Int] = cons(init._1,fib(using (init._2,init._1 + init._2)))
Copy the code

Most of the logic of these three methods is also the same, which can be further extracted from the Unfold method.

def unfold[A.S](z : S)(f : S= >Option[(A.S)]) : _Stream[A] = f(z) match 
  case Some((h,s)) => cons(h,unfold(s)(f))
  case None => empty  
Copy the code

The type parameter S represents the state. Unfold recursion is accompanied by state transitions, which we continue to discuss in the next topic.

Unfold is a typical corecursive function (or guarded recursively) guarded cursive. In ordinary recursion, the range of parameters passed in is gradually reduced to achieve convergence. Corecursion does not set convergence conditions (or critical conditions), so it relies on lazy-loaded data structures and builds infinite streams.

def constant1[A](i: A): _Stream[A] = unfold(true) (Some(i, _))

def from1(i: Int): _Stream[Int] = unfold(true) (Some(i + 1, _))

def fib1: _Stream[Int] = unfold((0.1)) { case (f0, f1) => Some((f0, (f1, f0 + f1))) }
Copy the code

Unfold doesn’t just provide unlimited streaming. Given a convergent partial function, it can also implement the map, take, takeWhile methods. Such as:

// Defined in enum _Stream[+A]
def map1[B]( f : A= >B) : _Stream[B] = unfold(this) {case Cons(h,t) => Some((f(h()),t()))
  case_ = >None
}

def take1(n: Int): _Stream[A] = unfold(this,n){
  case (Cons(h,t), m) if m > 0= >Some((h(),(t(),m- 1)))
  case_ = >None
}

def takeWhile2(f : A= >Boolean) : _Stream[A] = unfold(this) {case Cons(h,t) if f(h()) => Some((h(),t()))
  case_ = >None
}
Copy the code