Deferred execution and immutability, the system teaches JavaStream data processing

Recently, when I was writing a business in my company, I suddenly couldn’t remember how to write the sum in Stream.

But can only face Google programming, spent my precious three minutes later, learned, very simple.

Since I started using JDK8, Stream has been my most common feature, and has been used for various streaming operations. However, after this incident I suddenly felt that Stream was really unfamiliar to me.

It’s probably all the same, and the most commonly used stuff is the easiest to ignore, and you probably won’t remember to look at something like Stream even if you’re preparing for an interview.

But since I have noticed it, I have to comb it again, which is to fill in the gaps in my overall knowledge system.

I spent a lot of time to write this article Stream. I hope you and I can re-understand and learn Stream, understand the API or understand the internal characteristics, fear what truth is infinite, and further have further joy.

In this article, I’ve divided the content of the Stream into the following sections:

If you look at this diagram, you might be confused by the terms “transform Stream operations” and “finalize Stream operations”, but I’ve grouped all Stream APIs into two classes, each with a name (see the Java8 book at the bottom of this article) :

Convert Stream operations: For example, the filter and map methods convert one Stream to another and return a Stream value.
End Stream operations: For example, count and collect methods that aggregate a Stream into the results we want, return values that are not streams.

Among them, I also divided the API of conversion stream operation into two types, which will be explained in detail in the article. Here is the definition first, and I have a general impression:

Stateless: That is, the execution of this method does not depend on the result set of the execution of the previous method.
Stateful: That is, the execution of this method depends on the result set of the execution of the previous method.

Due to the excessive content of Stream, I split Stream into two parts. This is the first part, which is informative, with simple and rich use cases.

The topic of the second article is only one finalization, but the finalization API is complex, so it is informative, and the use cases are simple and rich. In terms of length, the two are similar, so stay focused.

Note: Because my local computer is JDK11 and I forgot to switch to JDK8, the list.of () that appears in a lot of use cases is not available in JDK8. It is equivalent to JDK8’s Arrays.asList().

1. Why Stream?

It all started with the release of JDK8. In the days of functional programming languages, Java was criticized for its bloat (strong object orientation), and the community was crying out for Java to add functional language features to improve the situation. JDK8 was released in 2014.

In JDK8, I think the biggest new feature is the addition of functional interfaces and lambda expressions, which are taken from functional programming.

The addition of these two characteristics makes JAVA become more simple and elegant, with function against function, consolidate the position of the big brother of JAVA, is simply a long skills to make JAVA.

Stream is a JDK8 library for collections based on the above two features. It allows us to process the data in a more concise and pipelined way with lambda expressions. It is easy to do things like: Filtering, grouping, collecting, reducing, and so on, I’d like to refer to Stream as a best practice for functional interfaces.

1.1 Clearer code structure

The Stream has a cleaner code structure, and to illustrate how the Stream makes the code cleaner, let’s assume that we have a very simple requirement: to find all the elements greater than 2 in a set.

Let’s take a look before we use Stream:

        List<Integer> list = List.of(1, 2, 3);
        
        List<Integer> filterList = new ArrayList<>();
        
        for (Integer i : list) {
            if (i > 2) {
                filterList.add(i);
            }
        }
        
        System.out.println(filterList);

The above code is very easy to understand, I will not explain too much, in fact, it is ok, because our requirements are relatively simple, if the requirements are more?

For each additional requirement, another condition is added to the if, and in our development we often have a lot of fields on objects, so there might be four or five conditions, and it might end up like this:

        List<Integer> list = List.of(1, 2, 3);

        List<Integer> filterList = new ArrayList<>();

        for (Integer i : list) {
            if (i > 2 && i < 10 && (i % 2 == 0)) {
                filterList.add(i);
            }
        }

        System.out.println(filterList);

If with a lot of conditions, it seems that becomes a mess, actually this is also good, is the most terrible of projects often have a lot of similar demand, the difference between them is just a certain condition is different, so you need to copy a big tuo code, change it is launched, this leads to have a lot of repetitive code in the code.

If you Stream, everything will be clear and easy to understand:

        List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

In this code, you only need to focus on what we are most concerned about: filtering conditions. The method name Filter can make you know clearly that it is a filtering condition, and the method name Collect can also see that it is a collector, collecting the final results into a List.

In the meantime, you may have noticed, why not write a loop in the above code?

Because the Stream helps us to loop implicitly, this is called an internal iteration, as opposed to an external iteration.

So even if you don’t write a loop, it will loop through it.

1.2 Do not care about variable state

Stream is designed to be immutable, and its immutability has two meanings:

Since each Stream operation generates a new Stream, streams are immutable, just like strings.
Only references to the original collection are stored in the Stream, so any operations that modify an element are generated from the original element, so any operations in the Stream do not affect the original object.

The first one helps us with chained calls, which we actually use when we use streams, and the second one is a feature of functional programming: no state changes.

Whatever you do to the Stream, it will not ultimately affect the original collection, and its return value will be calculated from the original collection.

So in Stream we don’t have to worry about the side effects of manipulating the original collection of objects, and we’re done.

About functional programming can refer to Ruan Yifeng’s preliminary study of functional programming.

1.3 Delayed execution and optimization

The Stream is executed only when a terminal operation is encountered, such as:

        List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println);

This code is never executed. The peek method can be thought of as a forEach, and I’m using it to print elements in the Stream.

Because the filter method and the peek method are both transformation flow methods, no execution is triggered.

It works if we add a count method at the end:

        List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println)
                .count();

The count method is a terminal operation that calculates how many elements there are in the Stream and returns a long.

This feature of a Stream that does not execute without terminating an operation is called deferred execution.

At the same time, Stream also optimizes stateless methods in the API in a way called loop merge, as shown in Section 3.

2. Create the Stream

This section mainly introduces some common ways to create a Stream. Generally, the creation of a Stream can be divided into two situations:

Created using the Steam interface
Created from the collection class library

We’ll also talk about parallel streams and joins, both of which create streams but have different characteristics.

2.1 Created through the Stream interface

The Stream is an interface that defines several static methods that provide the API for creating the Stream:

    public static<T> Stream<T> of(T... values) {
        return Arrays.stream(values);
    }

The first is the of method, which provides a generic mutable argument, creates a Stream for us with a generic type, and wraps the base type with an auto-wrapper if your argument is a base type:

Stream<Integer> integerStream = Stream.of(1, 2, 3); Stream<Double> doubleStream = stream. of(1.1d, 2.2d, 3.2d); Stream<String> stringStream = Stream.of("1", "2", "3");

Of course, you can also create an empty Stream directly by calling another static method, empty(), whose generic type is Object:

        Stream<Object> empty = Stream.empty();

Generate () is another way to create a Stream with an unlimited number of elements: generate()

    public static<T> Stream<T> generate(Supplier<? extends T> s) {
        Objects.requireNonNull(s);
        return StreamSupport.stream(
                new StreamSpliterators.InfiniteSupplyingSpliterator.OfRef<>(Long.MAX_VALUE, s), false);
    }

In terms of method arguments, it takes a functional interface — Supplier as an argument. This functional interface is the interface used to create objects. You can think of it as an object creation factory, and a Stream puts objects created from that factory into the Stream:

        Stream<String> generate = Stream.generate(() -> "Supplier");

        Stream<Integer> generateInteger = Stream.generate(() -> 123);

I’ve created a Supplier object for the convenience of using the LAMDBA directly. You can also pass in a Supplier object that will construct the object through the get() method of the Supplier interface.

2.2 Create through the collection class library

The second method is more common than the previous one, and we often Stream a collection instead of manually building a Stream:

        Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();
        
        Stream<String> stringStreamList = List.of("1", "2", "3").stream();

In Java8, the top-level interface for collections has been added to the stream() interface, which allows you to create streams for all subclasses of collections:

        Stream<Integer> listStream = List.of(1, 2, 3).stream();
        
        Stream<Integer> setStream = Set.of(1, 2, 3).stream();

By looking at the source code, we can start with the stream() method, which essentially creates the stream by calling a stream utility class:

    default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

2.3 Create parallel streams

In the above example, all streams are serial streams. In some scenarios, to maximize performance on a multicore CPU, we can use parallel streams that perform parallel operations using the fork/join framework introduced in JDK7. We can create parallel streams as follows:

        Stream<Integer> integerParallelStream = Stream.of(1, 2, 3).parallel();

        Stream<String> stringParallelStream = Stream.of("1", "2", "3").parallel();

        Stream<Integer> integerParallelStreamList = List.of(1, 2, 3).parallelStream();

        Stream<String> stringParallelStreamList = List.of("1", "2", "3").parallelStream();

Yes, there is no way to create a parallel Stream directly in a static Stream method. We need to call parallel() after the Stream is constructed to create a parallel Stream. Calling parallel() does not create a parallel Stream object from scratch. Instead, a parallel parameter is set on the original Stream object.

Of course, we can also see that we can create parallel streams directly in the Collection interface by calling the parallelStream() method corresponding to Stream(). As I said, the only difference between them is the parameters:

    default Stream<E> stream() {
        return StreamSupport.stream(spliterator(), false);
    }

    default Stream<E> parallelStream() {
        return StreamSupport.stream(spliterator(), true);
    }

However, in general we do not need to use parallel streams, and with thousands of elements in the Stream there is not much improvement in performance because of the cost of spreading the elements across different CPUs for computation.

Parallel is taking full advantage of the benefits of multi-core CPU performance, but tend to use the data segmentation, and then dispersed to each CPU processing, if we use the data is an array structure can be easily to break up, but if it is a linked list data structure of data or the Hash structure is divided up and obviously not array structure is convenient.

So only if you have 10,000 or more elements in the Stream will you get a significant performance boost from using parallel streams.

Finally, when you have a parallel stream, you can also easily convert it to a serial stream with sequential() :

        Stream.of(1, 2, 3).parallel().sequential();

2.4 connect the Stream

If you construct two streams in two places and want to combine them, use concat() :

        Stream<Integer> concat = Stream
                .concat(Stream.of(1, 2, 3), Stream.of(4, 5, 6));

If two different generic streams are combined, autoinference automatically inferences the same parent of the two types:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

        Stream<String> stringStream = Stream.of("1", "2", "3");

        Stream<? extends Serializable> stream = Stream.concat(integerStream, stringStream);

3. Stateless method for Stream conversion operations

Stateless method: That is, the execution of this method does not depend on the result set executed by the previous method.

There are about three stateless APIs that we use in Stream:

map()Method: The argument to this method is a Function object, which allows you to perform custom operations on the elements in the collection and preserve the elements after the operation.
filter()Method: The argument to this method is a Predicate object, and the result of the Predicate execution is a Boolean type, so this method only holds elements that return true. As the name indicates, we can use this method to do some filtering.
flatMap()Method: Like map(), this method takes a Function object, but the return value of this Function is a Stream. This method can be used to aggregate elements from multiple streams.

Let’s look at an example of the map() method:

        Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();

        Stream<Integer> mapStream = integerStreamList.map(i -> i * 10);

If we have a List and want to multiply each element by 10, we can use the above method, where I is the variable name of the element in the List, → the following logic is the operation to be performed on this element, passing in a piece of code logic execution in a very concise way. This code will eventually return a new Stream containing the result of the operation.

To help you understand better, I have drawn a simple picture here:

Here’s an example of the filter() method:

        Stream<Integer> integerStreamList = List.of(10, 20, 30).stream();

        Stream<Integer> filterStream = integerStreamList.filter(i -> i >= 20);

In this code, I >= 20 is executed, and the result that returns true is saved in a new Stream and returned.

I also have a simple diagram here:

The flatMap() method is described in the previous section, but it’s a bit abstract, and I had to search through a lot of examples to get a good understanding of it.

According to the official documentation, this method is for flattening one-to-many elements:

        List<Order> orders = List.of(new Order(), new Order());

        Stream<Item> itemStream = orders.stream()
                .flatMap(order -> order.getItemList().stream());

Here I illustrate this with an order example, where each of our orders contains a List of items, and if I want to compose a new List of items from both orders, I need to use the flatMap() method.

In the code example above, you can see that each order returns a Stream of a List of items. In this case, we only have two orders, so we end up returning streams of two lists of items. The flatMap() method extracts the elements from the two streams and places them in a new Stream.

As usual, here’s a simple illustration:

In the illustration I use cyan for Stream, and in the final output you can see that the flatMap() outputs two streams into a single Stream, which is useful in some scenarios, such as my order example above.

There is also a much less common stateless method peek() :

    Stream<T> peek(Consumer<? super T> action);

The peek method takes a Consumer object as an argument, which has no return value. We can use the peek method to do things like print elements:

        Stream<Integer> peekStream = integerStreamList.peek(i -> System.out.println(i));

However, it’s not recommended if you’re not familiar with it. It won’t work in some cases, such as:

        List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .peek(System.out::println)
                .count();

The API documentation also states that this method is for debugging, and in my experience, Peek will only be executed when the Stream eventually needs to re-produce the elements.

In the example above, count only needs to return the number of elements, so peek is not executed, which it would be if the collect method were used instead.

Or if there are filter methods in the Stream, such as filter methods and match related methods, it will also execute.

3.1 The base type Stream

The previous section mentioned the three most common stateless methods of the three streams. There are several other stateless methods in the Stream that correspond to map() and flatMap(). These methods are:

mapToInt
mapToLong
mapToDouble
flatMapToInt
flatMapToLong
flatMapToDouble

These six methods, as you can see from the method name, simply convert the return value on top of either map() or flatMap(). There is no need to make a single method out of them, but the key to all of them is the return value:

Map ToInt returns IntStream
MapToLong returns the value LongStream
MapToDouble returns doubleStream
FlatMapToInt returns intStream
FlatMapToLong returns the value of LongStream
FlatMapToDouble returns doubleStream

In order to make Java more object-oriented in JDK5, the concept of wrapper classes was introduced. Each of the eight basic data types has a wrapper class, which allows you to automatically unbox/box the base types without any sense, that is, automatically use the wrapper class conversion method.

For example, in the previous example, I used the following example:

        Stream<Integer> integerStream = Stream.of(1, 2, 3);

I used the base datatyping parameter when I created a Stream, and the generics were automatically wrapped as Integer, but we sometimes ignore the cost of unpacking. If we want to ignore this cost when we use a Stream, we can use a Stream that converts to the base datatyping design:

IntStream: Corresponds to int, short, char, Boolean in the underlying data type
Longstream: Correlates to a long in the underlying data type
DoubleStream: Corcorrespond to double and float in the underlying data type

In each of these interfaces, the Stream can be constructed using the “of” method, as in the previous example, without being automatically unboxed.

So the six methods mentioned above are essentially replacing a normal flow with this basic type of flow, which can be more efficient when needed.

The base type Stream has the same API as the Stream, so as long as the Stream is understood in terms of usage, the base type Stream is the same.

Note: Intstream, Longstream, and DoubleStream are all interfaces, but do not inherit from the Stream interface.

3.2 Loop merge of stateless methods

Having said that, let’s look at an example from the previous article:

        List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

I used the filter method three times in this example. Do you think the Stream will loop through the filter three times?

If I change one of the filters to map, how many times do you think it’s going to loop?

        List<Integer> list = List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());

Our intuition is that all elements need to be processed using the map method first, and then filtered using the filter method, so we need to execute the loop three times.

But if you look back at the definition of a stateless method, you can see that the other three conditions can be done in one loop. Because the filter only depends on the result set of the map, and not on the result set after the map is executed, they can be done in one loop as long as you make sure you operate on the map first and then on the filter. This optimization is called cyclic merge.

All stateless methods can be executed in the same loop, or they can be easily executed on multiple CPUs using parallel streams.

4. Stateful methods for Stream conversion operations

The stateless method is easier to use than the stateless method. Just look at its name and see what it does:

The method name	Methods the results
distinct()	Element de-duplication.
sorted()	Element sorting, overloaded with two methods, can be passed in a sort object if needed.
limit(long maxSize)	You pass in a number that means you only take the first X elements.
skip(long n)	Pass in a number that skips X elements and takes the next element.
takeWhile(Predicate predicate)	New in JDK9 to pass an assertion argument that stops when the first assertion is false and returns the element that was previously asserted to be true.
dropWhile(Predicate predicate)	New in JDK9, pass in an assertion argument that stops when the first assertion is false and removes elements that were previously asserted to be true.

These are all stateful methods whose method execution depends on the result set of the previous method. For example, sorting methods depend on the result set of the previous method in order to be sorted.

The limit method and takeWhile are both short-circuited methods, which are more efficient because the desired element may have been selected before the inner loop was completed.

Therefore, stateful methods can not be executed in a loop like stateless methods. Each stateful method has to go through a separate internal loop, so the order in which the code is written will affect the execution result and performance of the program. I hope you will pay attention to this during the development process.

5. To summarize

This article is mainly to do an overview of the Stream, and describes the Stream two major characteristics:

immutable: does not affect the original collection, each call returns a new Stream.
Delay the: The Stream is not executed until a terminating operation is encountered.

It also breaks the Stream API into conversions and terminations, and explains all the common conversions. The next chapter will focus on terminations.

One interesting thing I noticed while looking at the Stream source code is that in the ReferencePipeline class (the implementation class for Stream), the order of its methods goes from top to bottom: stateless methods → stateful methods → aggregate methods.

Well, by the end of this article, I think you have a pretty clear picture of the Stream as a whole, and you should have a good grasp of the conversion API. After all, there are not many 😂, Java8 has a lot of powerful features, we’ll talk about it next time ~

At the same time, the following books are also referred to in the writing process of this paper:

Java SE 8 for busy people
Java 8 functional programming
Java 8 of actual combat

All three of these books are very good, the first is written by the author of the core Java technology, if you want to fully understand the JDK8 upgrade you should read this book.

The second was a booklet, as it were, of a hundred or so pages, very short, and devoted mainly to functional ideas.

If you can only read one book, I recommend a third, which has a 9.2 rating on Douban. The content and quality are excellent.