Source code Parsing for Kotlin Sequences

Brief description: It’s been a long time since I published an original article, and as always, today we begin the tenth lecture of the Kotlin Shashlets series, exploring the sequences in Kotlin. Sequences are essentially duplicates of streams in Java8. As you can see from the previous blog, Kotlin defines many apis for manipulating collections. Yes, these functions also work for Sequences, and sequence operations are better than collection operations in terms of performance. And as you can see from the previous source code for functional apis, they create many intermediate sets, and each operator opens up memory to store intermediate data sets, which are not needed in sequences. Let’s take a look at this blog post:

1. Why Sequences are needed?
2. What are Sequences?
3. How to create Sequences?
4. Performance comparison of sequence operations and set operations
5. Principle of performance optimization for Sequences
6, sequence (Sequences) principle source code fully analyzed

1. Why Sequences are needed?

In Kotlin we tend to deal with data sets as sets, and we use some of the functional operator apis in sets, and we rarely convert data sets to sequences to do set operations. This is because the amount of data we typically work with is small, so there’s no difference between using sets and sequences. Let’s take a look at an example and you’ll see what sequences mean.

// Instead of using Sequences, use normal collection operations
fun computeRunTime(action: (() -> Unit)? {valstartTime = System.currentTimeMillis() action? .invoke() println("the code run time is ${System.currentTimeMillis() - startTime}")}fun main(args: Array<String>) = computeRunTime {
  (0.10000000.)
        .map { it + 1 }
        .filter { it % 2= =0 }
        .count { it < 10 }
        .run {
            println("by using list way, result is : $this")}}Copy the code

Running results:

// Convert to Sequences, using sequence operations
fun computeRunTime(action: (() -> Unit)? {valstartTime = System.currentTimeMillis() action? .invoke() println("the code run time is ${System.currentTimeMillis() - startTime}")}fun main(args: Array<String>) = computeRunTime {
    (0.10000000.)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2= =0 }
        .count { it < 10 }
        .run {
            println("by using sequences way, result is : $this")}}Copy the code

Running results:

Through the implementation of the same function above, there is not only a little difference in the running time between using ordinary set operation and converting the operation into sequence, which corresponds to the performance difference of the two implementation methods in the case of large data set magnitude. This should explain why we need to use Sequences.

2. What are Sequences?

Sequence operations are also known as lazy set operations, and the power of the Sequences interface lies in the way the operations are implemented. The elements in a sequence are evaluated lazily, so sequences can be used more efficiently to chain operations (mapping, filtering, transformation, etc.) on the elements in a data set, rather than having to open up new memory to store intermediate results for each data operation, as in a normal set. In fact, the vast majority of data set operations focus on the end result rather than the intermediate process,

Sequences are another option for manipulating data sets in Kotlin, much like the new Stream in Java8, where you can convert a data set to a Stream and then perform data manipulation on the Stream (mapping, filtering, transforming, etc.). Sequences can be used to optimize collections for specific scenarios. But it’s not meant to replace sets, it’s meant to complement them.

Sequence operations fall into two broad categories:

1. Intermediate operation

An intermediate operation of a sequence is always lazy. Each intermediate operation returns a sequence that internally knows how to transform elements of the original sequence. How do we say that intermediate operations in a sequence are lazy? Here’s an example:

fun main(args: Array<String>){(0.6.)
        .asSequence()
        .map {// Map returns Sequence
      
       , so it is an intermediate operation
      
            println("map: $it")
            return@map it + 1
        }
        .filter {// Filter returns Sequence
      
       , so it is an intermediate operation
      
            println("filter: $it")
            return@filter it % 2= =0}}Copy the code

Running results:

In the above example, only the intermediate operation does not have the terminal operation. The result shows that map and filter do not output any prompt, which means that the operation of map and filter is delayed, and they will output the prompt only when the result is obtained (i.e. when the terminal operation is called).

2. Terminal operation

The end operation of the sequence performs all the delayed calculations of the original intermediate operation, and a single end operation returns a result, which can be a set, a number, or any object transformed from another set of objects. The above example plus the terminal operation:

fun main(args: Array<String>){(0.6.)
        .asSequence()
        .map {// Map returns Sequence
      
       , so it is an intermediate operation
      
            println("map: $it")
            return@map it + 1
        }
        .filter {// Filter returns Sequence
      
       , so it is an intermediate operation
      
            println("filter: $it")
            return@filter it % 2= =0
        }
        .count {//count returns an Int, so it is a terminal operation
            it < 6
        }
        .run {
            println("result is $this"); }}Copy the code

The results

Note: It is easy to determine whether the operation is intermediate or terminal by looking at the type of the value returned by the API function. If it returns a Sequence<T>, it is an intermediate operation. If it returns a specific result type, Like Int,Boolean, or any other object, then it’s an end operation

3. How to create Sequences?

The main methods for creating Sequences are:

1. Use the Iterable extension function asSequence.

// Define a declaration
public fun <T>可迭代<T>.asSequence(a): Sequence<T> {
    return Sequence { this.iterator() }
}
// Call the implementation
list.asSequence()
Copy the code

2. Use generateSequence to generate a sequence.

// Define a declaration
@kotlin.internal.LowPriorityInOverloadResolution
public fun <T : Any> generateSequence(seed: T? , nextFunction: (T)-> T?) : Sequence<T> =if (seed == null)
        EmptySequence
    else
        GeneratorSequence({ seed }, nextFunction)

// Call implementation, seed is the starting value of the sequence, nextFunction iterates over the function operation
val naturalNumbers = generateSequence(0) { it + 1 } // Use iterators to generate a sequence of natural numbers
Copy the code

ConstrainOnce, an extension of Sequence

, constrainOnce, is used to generate a Sequence that is used once.

// Define a declaration
public fun <T> Sequence<T>.constrainOnce(a): Sequence<T> {
    // as? does not work in js
    //return this as? ConstrainedOnceSequence
      
        ? : ConstrainedOnceSequence(this)
      
    return if (this is ConstrainedOnceSequence<T>) this else ConstrainedOnceSequence(this)}// Call the implementation
val naturalNumbers = generateSequence(0) { it + 1 }
val naturalNumbersOnce = naturalNumbers.constrainOnce()
Copy the code

Note: You can only iterate once, but if you fail to iterate once, IllegalStateException(“This sequence can be Consumed only once.”) is thrown.

4. Performance comparison of sequence operations and set operations

In terms of sequential performance comparisons, the following scenarios are used to give you an idea of when to use normal collection operations or sequential operations.

1. The same data operation is performed when the data magnitude is relatively large.

Using Sequences

fun computeRunTime(action: (() -> Unit)? {valstartTime = System.currentTimeMillis() action? .invoke() println("the code run time is ${System.currentTimeMillis() - startTime} ms")}fun main(args: Array<String>) = computeRunTime {
    (0.10000000.)//10000000 data level
        .asSequence()
        .map { it + 1 }
        .filter { it % 2= =0 }
        .count { it < 100 }
        .run {
            println("by using sequence result is $this")}}Copy the code

Running results:

No Sequences are used

fun computeRunTime(action: (() -> Unit)? {valstartTime = System.currentTimeMillis() action? .invoke() println("the code run time is ${System.currentTimeMillis() - startTime} ms")}fun main(args: Array<String>) = computeRunTime {
    (0.10000000.)//10000000 data level
        .map { it + 1 }
        .filter { it % 2= =0 }
        .count { it < 100 }
        .run {
            println("by using sequence result is $this")}}Copy the code

Running results:

2. The same data operation in the case of relatively small data magnitude.

Using Sequences

fun computeRunTime(action: (() -> Unit)? {valstartTime = System.currentTimeMillis() action? .invoke() println("the code run time is ${System.currentTimeMillis() - startTime} ms")}fun main(args: Array<String>) = computeRunTime {
    (0.1000.)//1000 data level
        .asSequence()
        .map { it + 1 }
        .filter { it % 2= =0 }
        .count { it < 100 }
        .run {
            println("by using sequence result is $this")}}Copy the code

Running results:

No Sequences are used

fun computeRunTime(action: (() -> Unit)? {valstartTime = System.currentTimeMillis() action? .invoke() println("the code run time is ${System.currentTimeMillis() - startTime} ms")}fun main(args: Array<String>) = computeRunTime {
    (0.1000.)//1000 data level
        .map { it + 1 }
        .filter { it % 2= =0 }
        .count { it < 100 }
        .run {
            println("by using list result is $this")}}Copy the code

Running results:

Through the above performance comparison, it is found that the performance of Sequences is better than that of ordinary data sets in the case of large data magnitude. However, using Sequences with relatively small data levels may actually perform worse than normal data sets. About the choice of sequence or set, remember the previous translation of a foreign blog, there is a detailed elaboration. Blog address

5. Principle of performance optimization for Sequences

Seeing the performance comparisons above, you can’t wait to know the principles of internal performance optimization for Sequences at this moment. Let’s take a look at the principles of internal performance optimization for Sequences. For example

fun main(args: Array<String>){(0.10.)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2= =0 }
        .count { it < 6 }
        .run {
            println("by using sequence result is $this")}}Copy the code

1. Basic principle description

Sequential operations: The basic principle is lazy evaluation, which means that no intermediate data results are produced during an intermediate operation, but only when the end operation is performed. That is, for each data element from 0 to 10 in the above example, the map operation is followed immediately by the filter operation. The next element also performs a map operation, followed immediately by a filter operation. However, the normal collection is the principle of storing the data after all the elements of the map are executed, and then performing the filter operation from the stored data set for all the elements.

Set common operations: For each operation will produce new intermediate results, also is the example of the map after the operation will be the original data set to iterate over JiCun once get the latest data on the new collection, then the filter operation, through the last time the new set of data elements, and finally get the newest data sets and there is a new collection.

2. Schematic diagram

// Use sequence
fun main(args: Array<String>){(0.100.)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2= =0 }
        .find { it > 3}}// Use normal collections
fun main(args: Array<String>){(0.100.)
        .map { it + 1 }
        .filter { it % 2= =0 }
        .find { it > 3}}Copy the code

Through the schematic diagram above, it can be found that the sequence will operate element by element, and some unnecessary operations can be removed in advance before the end operation find results, and after find finds a qualified element, many subsequent element operations can be eliminated, so as to achieve the purpose of optimization. Set normal operations, no matter which element, must go through all operations by default. In fact, there is no need to perform some operations before obtaining the results, and before obtaining the results, we can perceive whether the operation meets the conditions. If it does not meet the conditions, we can discard it in advance to avoid the loss of performance caused by unnecessary operations.

6, sequence (Sequences) principle source code fully analyzed

// Use sequence
fun main(args: Array<String>){(0.100.)
        .asSequence()
        .map { it + 1 }
        .filter { it % 2= =0 }
        .find { it > 3}}// Use normal collections
fun main(args: Array<String>){(0.100.)
        .map { it + 1 }
        .filter { it % 2= =0 }
        .find { it > 3}}Copy the code

Decompile’s source code for the example above shows that normal collection operations generate a while loop for each operation and create a new collection each time to hold the intermediate results. Sequences don’t. They share data in the same iterator no matter how many intermediate operations they do internally. Want to know how you share data in the same iterator? Move on to the internal source implementation.

Decompile the source code using collection common operations

 public static final void main(@NotNull String[] args) {
      Intrinsics.checkParameterIsNotNull(args, "args");
      byte var1 = 0;
      Iterable $receiver$iv = (Iterable)(new IntRange(var1, 100));
      // Create a new collection to store the intermediate results of the map
      Collection destination$iv$iv = (Collection)(new ArrayList(CollectionsKt.collectionSizeOrDefault($receiver$iv, 10)));
      Iterator var4 = $receiver$iv.iterator();

      int it;
      // Generates a while loop for the map operator
      while(var4.hasNext()) {
         it = ((IntIterator)var4).nextInt();
         Integer var11 = it + 1;
         // Add the map elements to the new collection
         destination$iv$iv.add(var11);
      }

      $receiver$iv = (Iterable)((List)destination$iv$iv);
      // Create a new collection to store the intermediate results of filter
      destination$iv$iv = (Collection)(new ArrayList());
      var4 = $receiver$iv.iterator();// The iterator in the new collection after getting the map
      // Generates a while loop for the filter operator
      while(var4.hasNext()) {
         Object element$iv$iv = var4.next();
         int it = ((Number)element$iv$iv).intValue();
         if (it % 2= =0) {
          // Add the filter element to the new collection
            destination$iv$iv.add(element$iv$iv);
         }
      }

      $receiver$iv = (Iterable)((List)destination$iv$iv);
      Iterator var13 = $receiver$iv.iterator();// Get the iterator in the new collection
      
      // The corresponding find operator generates a while loop, and the last operator simply iterates through the iterators in the new collection behind the filter to fetch the data that matches the criteria.
      while(var13.hasNext()) {
         Object var14 = var13.next();
         it = ((Number)var14).intValue();
         if (it > 3) {
            break; }}}Copy the code

Decompile the source code using Sequences lazy operations

1, the entire sequence operation source code

 public static final void main(@NotNull String[] args) {
      Intrinsics.checkParameterIsNotNull(args, "args");
      byte var1 = 0;
      // Use the Sequence extension function to implement fitler and map operations, and finally return a Sequence object.
      Sequence var7 = SequencesKt.filter(SequencesKt.map(CollectionsKt.asSequence((Iterable)(new IntRange(var1, 100))), (Function1)null.INSTANCE), (Function1)null.INSTANCE);
      / / take out after the middle in the sequence of operation iterator that can be found in the middle of the map, the filter operation to share the same data in an iterator, each operation will produce new iterator object, but the data was introduced into data sharing in the iterator, and finally to operation at the end of the time only need to traverse the iterator eligible elements.
      Iterator var3 = var7.iterator();
      // The corresponding find operator generates a while loop, and the last operator simply iterates through the iterators in the new collection behind the filter to fetch the data that matches the criteria.
      while(var3.hasNext()) {
         Object var4 = var3.next();
         int it = ((Number)var4).intValue();
         if (it > 3) {
            break; }}}Copy the code

2. Extract this key code and go further:

SequencesKt.filter(SequencesKt.map(CollectionsKt.asSequence((Iterable)(new IntRange(var1, 100))), (Function1)null.INSTANCE), (Function1)null.INSTANCE);
Copy the code

3. Break this code down into three parts:

// The first part
val collectionSequence = CollectionsKt.asSequence((Iterable)(new IntRange(var1, 100)))
// Part 2
val mapSequence = SequencesKt.map(collectionSequence, (Function1)null.INSTANCE)
// Part 3
val filterSequence = SequencesKt.filter(mapSequence, (Function1)null.INSTANCE)
Copy the code

4. Explain the first part of the code

The first part of the decomcompiled source code is simple, mainly by calling the extension function in Iterable<T> to convert the original data set into Sequence<T> objects.

public fun <T>可迭代<T>.asSequence(a): Sequence<T> {
    return Sequence { this.iterator() }// Pass an iterator object in the outer Iterable
      
}
Copy the code

Take it a step further:

@kotlin.internal.InlineOnly
public inline fun <T> Sequence(crossinline iterator: () -> Iterator<T>): Sequence<T> = object : Sequence<T> {
    override fun iterator(): Iterator<T> = iterator()
}
Copy the code

Sequence<T> is an interface that contains an iterator() function that returns an iterator object. The Sequence<T> is an interface that contains an iterator() function that returns an iterator object. The core iterator is still the iterator passed in from the outside, which is a bit of a trick.

5. Explain the code for Part 2:

With the first part, we successfully converted the ordinary collection to the Sequence Sequence, and now we do the map operation, which is actually done by calling the Sequence<T> extension function map

val mapSequence = SequencesKt.map(collectionSequence, (Function1)null.INSTANCE)
Copy the code

Enter the map extension function:

public fun <T, R> Sequence<T>.map(transform: (T) -> R): Sequence<R> {
    return TransformingSequence(this, transform)
}
Copy the code

You will find that internally a TransformingSequence object is returned. The object constructor receives an object of type Sequence<T>, a lambda expression of a transform, and finally a Sequence<R> object. We’ll leave it there for now, and we’ll talk more about it later.

6. Explain the code for Part 3:

In the second part, after the map operation, the Sequence object is returned. Finally, the filter operation is performed on the Sequence object, which is also an extension function of the Sequence. Finally, the Sequence object is returned.

val filterSequence = SequencesKt.filter(mapSequence, (Function1)null.INSTANCE)
Copy the code

Enter the filter extension function:

public fun <T> Sequence<T>.filter(predicate: (T) -> Boolean): Sequence<T> {
    return FilteringSequence(this.true, predicate)
}
Copy the code

You’ll see that a FilteringSequence object is returned internally. The constructor takes a Sequence<T> object, a lambda expression for the predicate, and finally returns a Sequence<T> object. We’ll leave it there for now, and we’ll talk more about it later.

7, the whole structure of Sequences source code introduction

Code structure diagram: All the corresponding operator classes are marked in the figure, which implement the Sequence

interface

First, Sequence<T> is an interface that contains only one abstract function, a function that returns an iterator object, and can be thought of as an iterator shell.

public interface Sequence<out T> {
    /** * Returns an [Iterator] that returns the values from the sequence. * * Throws an exception if the sequence is constrained to be iterated once and `iterator` is invoked the second time. */
    public operator fun iterator(a): Iterator<T>
}
Copy the code

Sequence Core class UML class diagram

Only a class diagram of a few common operators is drawn here

Note: As can be seen from the UML class diagram above, the principle of sharing data in the same iterator is actually implemented by using the state pattern of Java design pattern (object-oriented polymorphism principle). First, the iterator object returned by iterator() of Iterable<T> is used to instantiate the Sequence. Then called outside different operator, the operator corresponding to the corresponding extension functions, internal extension function for each different operation returns to realize Sequence interfaces subclass object, and the subclass is according to the implementation of different operation, change the interface of the iterator () function abstract iterator implementation, return a new iterator object, But the iterated data comes from the original iterator.

The TransformingSequence and FilteringSequence are described above.

Based on the above in-depth analysis of the entire Sequences structure, it is very simple to continue to analyze the TransformingSequence and FilteringSequence. Take the TransformingSequence as an example:

// Implement the Sequence
      
        interface, rewrite the iterator() method, rewrite the implementation of iterators
      
internal class TransformingSequence<T, R>
constructor(private val sequence: Sequence<T>, private val transformer: (T) -> R) : Sequence<R> {
    override fun iterator(a): Iterator<R> = object : Iterator<R> {// Construct a new iterator based on the data in the passed iterator.
        val iterator = sequence.iterator()// Get the iterator object in the incoming Sequence
        override fun next(a): R {
            return transformer(iterator.next())// Share the data in the same iterator with transformer.
        }

        override fun hasNext(a): Boolean {
            return iterator.hasNext()
        }
    }

    internal fun <E> flatten(iterator: (R) -> Iterator<E>): Sequence<E> {
        return FlatteningSequence<T, R, E>(sequence, transformer, iterator)
    }
}
Copy the code

9, source analysis summary

Each subclass object overrides the abstract method iterator() in the Sequence interface. The internal implementation is based on the data elements in the passed iterator object. Transforms, filters, merges, etc., and returns a new iterator object. This explains why sequences work by performing different operations on each element, rather than A normal set where all elements perform operation A and then all elements perform operation B. This is because the sequence always maintains an internal iterators, when an element is iterative, it need to be executed in sequence. A, B, C after each operation, if there is no end at this point, then the value will be stored in the C iterator, executed in sequence, waiting for the original collection of Shared data by iteration is completed, or does not meet certain conditions to terminate the iteration, Finally, fetch the data in the C iterator.

Welcome to the Kotlin Developer Association, where the latest Kotlin technical articles are published, and a weekly Kotlin foreign technical article is translated from time to time. If you like Kotlin, welcome to join us ~~~

Source code Parsing for Kotlin Sequences

1. Why Sequences are needed?

2. What are Sequences?

3. How to create Sequences?

4. Performance comparison of sequence operations and set operations

5. Principle of performance optimization for Sequences

6, sequence (Sequences) principle source code fully analyzed

Decompile the source code using collection common operations

Decompile the source code using Sequences lazy operations

Related Posts

Interpolator for Android animation

The friendliest article about RxJava

How does Retrofit work?