The Sequence Sequence

Sequence is a container type provided by the Kotlin library. It has the same ability to perform multiple steps on collections as Iterable, but does so in a completely different way:

val map = (0.. 3).filter { println("filter:$it") it % 2 == 0 }.map { println("map:$it") it + 1 } println(map)Copy the code

The above code is used to demonstrate continuous operations on Iterable. Its output is as follows:

filter:0
filter:1
filter:2
filter:3
map:0
map:2
[1, 3]
Copy the code

Chained collection functions such as Map and filter execute immediately and create intermediate temporary sets to hold data. This doesn’t matter when there’s not much raw data. However, when the raw data volume is very large. It becomes very inefficient. In this case, Sequence can be used to improve efficiency.

val sequence = (0.. Filter {println("filter:$it") it % 2 == 0}. Map {println("map:$it") it + 1} println(" ready to execute ") println(sequence.toList())Copy the code

The above code executes as follows:

Filter :0 map:0 filter:1 filter:2 map:2 filter:3 [1, 3]Copy the code

Compare Iterable and Sequence:

Iterable is immediate and Sequence is lazy: the former requires early results, so intermediates (new sets) are created at each step of the chain. The latter will delay the calculation of the result as much as possible, and the intermediate functions processed by Sequence do not perform any calculation. Instead, they return a new Sequence, decorating the previous one with new operations, and all of these calculations are performed only during terminal operations like toList.

The way to distinguish intermediate operators from terminal operators is also simple: if the operator returns data of type Sequence, it is the intermediate operator.

Iterable takes the first operation, applies it to the entire collection, and then moves to the next operation, officially called Eager or step-by-step. ** The Sequence performs all operations on each element one by one. Is an inert sequence — take the first element and apply all the operations, then take the next element, and so on. ** This is officially called Lazy or element-by-element (Lazy/element-by-element)

Sequence inertia has several advantages:

  • They operate in the natural order of the elements;
  • Do the bare minimum;
  • There can be infinitely many elements;
  • You don’t need to create collections at every step

Sequence improves the performance of the entire collection processing chain by avoiding generating the results of intermediate steps. However, the lazy nature also imposes some running overhead. Therefore, it is important to balance the lazy overhead with the intermediate overhead and choose between Sequence and Iterable.

Order of execution

SequenceOf (1,2,3). Filter {print("F$it, "); it % 2 == 1 } .map { print("M$it, "); // Prints: F1, M1, E2, F2, F3, M3, E6, listOf(1,2,3). Filter {print("F$it, "); it % 2 == 1 } .map { print("M$it, "); it * 2 } .forEach { print("E$it, ") } // Prints: F1, F2, F3, M1, M3, E2, E6,Copy the code

Sequence operations are performed on the elements in sequence. For an element, all operations are performed in sequence. A normal collection operation, on the other hand, takes place as an operation step, and moves to the next operation after all the elements have performed the current operation.

Do the bare minimum

Imagine that we have 10 million numbers and we need to perform several transformations to extract 20. Compare the performance of sequences and different set operations using the following code:

fun main(){ val fFlow = FFlow() fFlow.demoList() fFlow.demoSequence() } fun demoSequence() { val currentTimeMillis = System.currentTimeMillis() val list = (0.. 10000000).asSequence().map { it * 2 }.map { it - 1 }.take(20).toList() Println (" demoSequence: ${system.currenttimemillis () - currentTimeMillis} : $list") } fun demoList() { val currentTimeMillis = System.currentTimeMillis() val list = (0.. 10000000).map { it * 2 }.map { it - 1 }.take(20).toList() println("demoList:${System.currentTimeMillis() - CurrentTimeMillis} : $list ")}Copy the code

The output is as follows:

[-1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37] demoSequence:20ms: [-1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37] [-1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37]Copy the code

This is what minimal operations mean. The sequence is executed in order of the elements and stops immediately after 29 elements are reached. Different sets have no notion of intermediate operations. Each operation will run through all the elements in the entire array before it moves on to the next one — that is, the first two maps will each run 10 million times.

The sequence can be infinite

Look at the following code:

var list = emptyArray<Int>()
var  i = 0
while(true){
    list[i] = i++
}
list.take(10)
Copy the code

Obviously, this code will not work because there is an infinite loop. Nor can we create a collection of infinite length. But: because sequential sequences are handled step by step as required, we can create infinite sequences:

Val noEnd = sequence {var I = 1 while (true) {yield(I) I *= 2}} noend.take (4).tolist ()Copy the code

But it’s important to note that we can do this, but we don’t really want to keep the while loop forever. We can’t use toList directly. You must provide an operator that terminates the loop, that is, not fetching all elements (infinite) — either limit their number with operations like take, or use terminal operations that do not require all elements, such as first, find, any, all, indexOf, etc.

Sequences do not create collections at every step

A normal set would create a new set after each transformation to store all the transformed elements. Each time a collection is created and data is filled in, there is a significant performance overhead. This is especially true when dealing with large or large collections. Sequence operations by element do not have this problem. No new collections are generated unless the terminal operator is called manually.

Basic use of Sequence

Sequence sequences are used in much the same way as normal Iterable, and are actually implemented internally using Iterable. Before exploring its internal implementation, WE want to demonstrate the basic usage of Sequence from the creation and basic Sequence operations.

Sequence creation

The Sequence can be created in different ways. Created by elements, Iterable, functions, and blocks of code.

Created by elements: by calling the top-level function sequenceOf:

val ints = sequenceOf(1, 2, 3, 4, 5, 6, 7)
val strings = sequenceOf("a","b","c","d","e")
Copy the code

2. To convert by Iterable with the Iterable extension function asSequence:

val ints = listOf(1, 2, 3, 4, 5, 6, 7).asSequence()
val strings = listOf("a","b","c","d","e").asSequence()
Copy the code

By generateSequence: This method has three:

generateSequence(seedFunction: () -> T? , nextFunction: (T) -> T?) : Sequence<T> generateSequence(seed: T? , nextFunction: (T) -> T?) : Sequence<T> generateSequence(nextFunction: () -> T?) : Sequence<T>Copy the code

GeneratorSequence is the final implementation, not source code analysis. Only the usage will be discussed:

  • NextFunction can be understood as an element generator. All elements in the sequence are generated by calling this function. When it returns null, the sequence stops generating (so nextFunction must return NULL at some point, Otherwise trigger for sequence elements is an unlimited number of Java. Lang. OutOfMemoryError: Java heap space).

  • The other two seedFunction and seed parameters are used to determine the initial value of the data. The difference is that one is specified directly and one is obtained by calling a function.

Use these three functions to generate sequences from 0 to 100 as follows:

val generateSequenceOne = generateSequence { if (i < 100) { i++ } else { null } } val generateSequenceTwo = GenerateSequence (0) {if (it < 100) {it+1} else {null}} val generateSequence three = generateSequence({generateSequence(0) {if (it < 100) {it+1} 0}) {if (it < 100) {it+1} else {null}}Copy the code

Generated by code blocks: with the sequence(block: suspend SequenceScope.() -> Unit) function. The change function takes a suspended function that takes a SequenceScope instance that we don’t need to create (source code analysis will cover this later). The SequenceScope provides the yield and yieldAll methods to return sequence elements to the caller and pause the sequence until the consumer requests the next element.

Use this function to generate a sequence from 0 to 100 as follows:

val ints = sequence {
    repeat(100) {
        yield(it)
    }
}
Copy the code

Sequence operation

Operations on sequences can be divided into intermediate operations and terminal operations. They only need to be distinguished in one other way:

  • The intermediate operation returns a lazily generated new sequence, while the end sequence is some other common data type;
  • Intermediate operations do not execute the code immediately, but only when the end sequence of operations is executed.

Common intermediate operations include map, fliter, first, last, and take. They sequentially provide enhancements such as data change filtering that are essentially the same as the collection operators provided by Kotlin.

Common terminal operations include toList, toMutableList, sum, and count. They not only provide sequence manipulation functions, but also trigger sequence execution.

Sequence source code analysis

I gave you a brief introduction to sequences. Let’s dive into the source code to see how Sequence is implemented.

What is Sequence?

Kotlin’s definition of the Sequence is simple:

public interface Sequence <out T> {
    public operator fun iterator(): Iterator<T>
}
Copy the code

An interface that defines a method that returns an Iterator. The interface itself defines only the ability for a Sequence to return an iterator. The specific function implementation depends on its implementation class to complete.

To summarize, a sequence is a class that provides iterators.

Analysis of how sequences are created

With the four ways of creating a sequence mentioned above, let’s take a look at its creation process.

We will use the asSequence method to generate a sequence using listOf(“a”,”b”,”c”,”d”,”e”).assequence (). The call chain is as follows:

public fun <T> Iterable<T>.asSequence(): Sequence<T> {
    return Sequence { this.iterator() }
}

public inline fun <T> Sequence(crossinline iterator: () -> Iterator<T>): Sequence<T> = object : Sequence<T> {
    override fun iterator(): Iterator<T> = iterator()
}
Copy the code

The process is simple, an extension function plus an inline function. Finally, a Sequence is created and returned using an anonymous inner class. The code is easy to understand, and in fact its implementation logic is equivalent to the following code:

val sequence = MySequence(listOf("a","b","c","d","e").iterator())

class MySequence<T>(private val iterator:Iterator<T>):Sequence<T>{
    override fun iterator(): Iterator<T> {
        return iterator
    }
}
Copy the code

SequenceOf (“a”,”b”,”c”,”d”,”e”);

public fun <T> sequenceOf(vararg elements: T): Sequence<T> = if (elements.isEmpty()) emptySequence() else elements.asSequence()
Copy the code

As you can see, this is still implemented with the help of asSequence.

Let’s take a look at the implementation of block code and generateSequence. These two methods are a little more complicated. After all, the first two methods are converted by a List, and the List itself provides an Iterator. The last two clearly need to provide new iterators. Let’s take a look at the code to see how it works:

val ints = sequence {
        repeat(100) {
            yield(it)
        }
    }
Copy the code

The sequence invocation logic is as follows:

public fun <T> sequence(@BuilderInference block: suspend SequenceScope<T>.() -> Unit): Sequence<T> = Sequence { iterator(block) } public fun <T> iterator(@BuilderInference block: suspend SequenceScope<T>.() -> Unit): Iterator<T> {val Iterator = SequenceBuilderIterator<T>() iterator.nextstep = block.createCoroutineUnintercepted(receiver = iterator, completion = iterator) return iterator } public inline fun <T> Sequence(crossinline iterator: () -> Iterator<T>): Sequence<T> = object : Sequence<T> { override fun iterator(): Iterator<T> = iterator() }Copy the code

As you can see, this method, like asSequence, ends up creating a Sequence through an anonymous inner class. The difference, however, is that this method requires the creation of a new iterator, the SequenceBuilderIterator. Using MySequence as an example, its creation process is equivalent to the following code:

fun mian(){
    create<Int> { myblock() }
}

suspend fun  SequenceScope<Int>.myblock(){
    repeat(100) {
        yield(it)
    }
}

fun <Int> create(block: suspend SequenceScope<Int>.() -> Unit):Sequence<Int>{
    val iterator = SequenceBuilderIterator<Int>()
    iterator.nextStep = block.createCoroutineUnintercepted(receiver = iterator, completion = iterator)
    return MySequence(iterator)
}
Copy the code

Of course, this is impossible because the SequenceBuilderIterator is private and we can’t access it directly. I’m going to force it down here to show you how it works.

GenerateSequence (); generateSequence ();

public fun <T : Any> generateSequence(seedFunction: () -> T? , nextFunction: (T) -> T?) : Sequence<T> = GeneratorSequence(seedFunction, nextFunction) public fun <T : Any> generateSequence(seed: T? , nextFunction: (T) -> T?) : Sequence<T> = if (seed == null) EmptySequence else GeneratorSequence({ seed }, nextFunction) public fun <T : Any> generateSequence(nextFunction: () -> T?) : Sequence<T> { return GeneratorSequence(nextFunction, { nextFunction() }).constrainOnce() }Copy the code

GeneratorSequence implements the Sequence interface and overwrites the iterator() method:

private class GeneratorSequence<T : Any>(private val getInitialValue: () -> T? , private val getNextValue: (T) -> T?) : Sequence<T> { override fun iterator(): Iterator<T> = object : Iterator<T> { var nextItem: T? = null var nextState: Int = -2 // -2 for initial unknown, -1 for next unknown, 0 for done, 1 for continue private fun calcNext() { nextItem = if (nextState == -2) getInitialValue() else getNextValue(nextItem!!) nextState = if (nextItem == null) 0 else 1 } override fun next(): T { if (nextState < 0) calcNext() if (nextState == 0) throw NoSuchElementException() val result = nextItem as T // Do not clean nextItem (to avoid keeping reference on yielded instance) -- need to keep state for getNextValue nextState = -1 return result } override fun hasNext(): Boolean { if (nextState < 0) calcNext() return nextState == 1 } } }Copy the code

To sum up, Sequence creation can be divided into three categories:

  • Create Sequence instances anonymously using the iterator provided with List.sequenceOf("a","b","c","d","e")andlistOf("a","b","c","d","e").asSequence()That’s the way it is;
  • Create a newSequenceBuilderIteratorIterator and anonymously create Sequence instances. For example, use code block creation.
  • createGeneratorSequenceIterator is created anonymously by overriding the iterator() method. The GeneratorSequence method takes this approach.

After looking at the creation method, there is nothing fancy, just a common class that provides iterators. Again, you can’t see how lazy the operation is. Let’s look at how lazy operations work.

The principle of inertia for sequences

Take the most commonly used map operator for example:

public inline fun <T, R> Iterable<T>.map(transform: (T) -> R): List<R> {// create a new ArrayList, Return mapTo(ArrayList<R>(collectionSizeOrDefault(10)), transform)} public inline fun <T, R, C: MutableCollection<in R>> Iterable<T>.mapTo(destination: C, transform: (T) -> R): C {// Iterate over the original set, perform the transformation operation, For (item in this) destination.add(transform(item)) return destination}Copy the code

As you can see, when list.map is called, a new collection is created immediately, and then the old data is iterated over and transformed. Finally, a new data is returned. This supports the theory mentioned above that the operations of ordinary sets are performed step by step and immediately.

Let’s look at the sequence map method, which has the following source code:

public fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R> { return TransformingSequence(this, transform) } internal class TransformingSequence<T, R> constructor(private val sequence: Sequence<T>, private val transformer: (T) -> R) : Sequence<R> { override fun iterator(): Iterator<R> = object : Iterator<R> {// 查 看 原 文 : Iterator<R> {// 查 看 原 文 : Iterator<R> {val Iterator = sequence. Iterator () override fun next(): R {// Note 2: At the beginning of an iteration, the iterator of the previous sequence is called up. return transformer(iterator.next()) } override fun hasNext(): Boolean { return iterator.hasNext() } } internal fun <E> flatten(iterator: (R) -> Iterator<E>): Sequence<E> { return FlatteningSequence<T, R, E>(sequence, transformer, iterator) } }Copy the code

The code is not complicated; it takes the user-supplied transformation function and sequence, creates a TransformingSequence and returns it. The TransformingSequence itself is the same as the sequence described above, except for its iterators: When fetching data via next, it does not return elements directly. Instead, the function provided by the caller is first called to perform the transformation. Return the transformed data – this is nothing new, as the normal collection map operator and RxJava map work the same way.

But here it’s a little different. There is no code in the operator that starts iteration, it just does the nesting of sequences and iterations, and does not start iteration. If the user does not call the iterator’s next function manually (the latter indirectly), the sequence will not be executed — this is how lazy execution works.

Also, since the operator returns a value of type Sequence, when you call map repeatedly, as in the following code:

(0.. 10) asSequence (). The map {add (it)}. The map {add (it)}. The map {add (it)}. ToList () / / equivalent to val sequence1 = (0.. 10).asSequence() val sequence2 = sequence1.map { it+1 } val sequence3 = sequence2.map { it+1 } sequence3.toList()Copy the code

Finally, the sequence sequence3 has the following structure: Sequence3 -> Sequence2 -> Sequence1. And they all have their own iterators. Each iterator overwrites its own transformation logic:

Override fun next(): R {return transformer(iterator.next())} override fun next(): R {return transformer(iterator.next())} override fun next(): R { return iterator.next()+1 }Copy the code

When we execute the code through Sequence3. toList, it flows like this:

public fun <T> Sequence<T>.toList(): List<T> { return this.toMutableList().optimizeReadOnlyList() } public fun <T> Sequence<T>.toMutableList(): Return toCollection(ArrayList<T>())} public fun <T, C: MutableCollection<in T>> Sequence<T>.toCollection(destination: C): C {// performs the iterator next operation // When the call (end operator) goes here, For (item in this) {destination.add(item)} return destination}Copy the code

After a few extended function calls, we finally start iterating (typical of Iterators) in toCollection, taking the Iterator instance of Sequence3 and pulling data from next. As you can see in the TransformingSequence source code above, the TransformingSequence holds an instance of the previous iterator (note 1).

And once the iteration starts, the next method of the previous iterator is called to iterate (code comment 2) before transformer operations (i.e., +1 operations). This iteration will eventually, eventually, call sequence1’s next method. Combined with the analysis of sequence creation above — the iterators held in Sequence1 are the iterators in the original data.

Then when the toList method is finally executed, it loops through the sequence3.iterator method. Within each loop, sequence2. Iterator’s next method, held by Sequence3, is first executed. Sequence2 follows sequence1’s sequence1.iterator ‘method and finally executes to the iterator next method of our original array:

The whole process is as follows:

The principle is as simple as that: intermediate operators nest iterators by nesting sequences. In this case, each iterator will be called in turn until the iterator in the original collection data is called and the element is returned. When the element returns, each iterator in turn holds the transformation operation to transform the data.

Other operators follow the same basic rules. Just add some other operations.

conclusion

  • Sequence iterators are nested and overwritten by intermediate operators to implement transformations based on element operations.
  • The intermediate operator is only responsible for creating and nesting iterators as required, not for starting iterators. This implements lazy operations without generating temporary collections;
  • The end operator is responsible for starting the iteration and performing the iteration in a nested order. Retrieve the data after the operation in turn, and create a new collection to store the final data;
  • Sequences are not a panacea because new objects are introduced. While bringing the benefits of inertia and sequential execution, these objects inevitably incur performance overhead. So, depending on your requirements, choose between collections and sequences and iterate in the appropriate way.