In the previous article we introduced the Stream API in Java 8. We mentioned that the Stream API can greatly improve the productivity of Java programmers, allowing them to write efficient, clean, and concise code.

So what is the performance of the Stream API, and does the code neatness mean a loss of performance? In this article we take a look at the performance of the Stream API.


In order to ensure the reliability of the test results, we run the JVM in -server mode, test data is at the level of GB, test machine adopts common commercial server, configuration is as follows:



I. Test method and data

Performance testing is not easy, and Java performance testing is even more difficult because the virtual machine has a significant impact on performance, and the JVM has two effects on performance:

  1. Impact of GC. GC behavior is a difficult part of Java to control, and to add certainty, we manually specified the CMS collector and 10GB of fixed-size heap memory. The JVM argument is -xx :+UseConcMarkSweepGC -xMS10g -XMx10g

  2. Just-in-time (JIT) compilation technology. The just-in-time compilation technique compiles hot code into native code while the JVM is running. During testing, we warm up the program and trigger just-in-time compilation of the test function. The relevant JVM parameter is -xx :CompileThreshold=10000.

The Stream is executed in parallel using the ForkJoinPool.commonPool() thread pool. To control parallelism we use the Linux taskset command to specify the number of cores available to the JVM.

The test data is randomly generated by the program. In order to prevent jitter caused by one test, the average time of four tests was calculated as the running time.

Second, basic type iteration

Test: Find the minimum value in an integer array. Compare the external iteration performance of the for loop with the internal iteration performance of the Stream API.

From the test program IntTest, the results are shown below:

The figure shows the baseline time ratio of the external iteration time of the for loop. Analysis is as follows:

  1. The performance overhead of the serial iteration is significantly higher (twice) than that of the external iteration for the primitive type Stream.

  2. Stream parallel iterations perform better than both serial and external iterations.

The parallel iteration performance is related to the number of available cores. All 12 cores are used in the parallel iteration in the figure above. In order to investigate the impact of using cores on performance, we specially test the parallel iteration effect of Stream with different cores:

Analysis, for basic types:

  1. Using the Stream parallel API performs poorly in single-core scenarios, worse than the Stream serial API;

  2. As the number of cores used increases, Stream parallelism becomes progressively better than external iteration using the for loop.

The above two tests show that the performance of Stream serial iteration is worse for simple iteration of basic type, but the performance of Stream iteration is better in multi-core case.

Third, object iteration

Now let’s look at the iterating effect of the object.

Test: Find the smallest element in the list of strings (in natural order) and compare the performance of iterating outside the for loop to iterating inside the Stream API.

Test program StringTest, the test result is shown below:

The results are analyzed as follows:

  1. The performance overhead of the serial iteration for the object type Stream is still higher than that of the external iteration (1.5 times), but the difference is not as large as for the base type.

  2. Stream parallel iterations perform better than both serial and external iterations.

Let’s separately examine the Stream parallel iteration effect:

Analysis, for object types:

  1. Using the Stream parallel API performs worse than external iteration of the for loop in single-core cases;

  2. With the increase of the number of cores used, the parallel effect of Stream gradually becomes better, and the effect brought by multiple cores is obvious.

The above two tests show that Stream serial iteration performance is worse for simple iterations of object types, but Stream iteration performance is better for multiple cores.

Reduction of complex objects

According to the results of experiment 1 and experiment 2, the serial execution of Stream is much worse than the external iteration. Before we jump to conclusions, let’s look at more complicated operations.

Test content: given the order list, statistics of the total transaction volume of each user. Compare the performance of a manual implementation using external iteration with the Stream API.

We simplify the Order to a tuple of

and represent it with an Order object. The test program ReductionTest, and the test results are shown as follows:
,>

Analysis, for complex reduction operations:

  1. The Stream API generally performs better than external manual iteration, and parallel streams perform better.

Then we examine the influence of parallelism on parallelism, and the test results are as follows:

Analysis, for complex reduction operations:

  1. The performance of Stream parallel reduction is worse than that of serial reduction and manual reduction in single-core case, which is simply the worst.

  2. With the increase of the number of cores used, the parallel effect of Stream gradually becomes better, and the effect brought by multiple cores is obvious.

The above two experiments show that for complex reduction operations, Stream serial reduction is better than manual reduction, and parallel reduction is better in multi-core cases. It is reasonable to expect similar performance from the Stream API for other complex operations.

Five, the conclusion

The results of the above three experiments can be summarized as follows:

  1. For simple operations, such as the simplest traversal, the Stream serial API performs significantly worse than display iteration, but the parallel Stream API can take advantage of the multicore nature.

  2. For complex operations, the Stream serial API performs as well as the manual implementation, and far better when executed in parallel.

So, for performance reasons,

1. Manual implementation using external iteration is recommended for simple operations

2. For complex operations, the Stream API is recommended.

3. In multi-core cases, it is recommended to use the parallel Stream API to take advantage of multi-core

4. It is not recommended to use the parallel Stream API in single-core scenarios

You can write much shorter code using the Stream API for code brevity. Even from a performance standpoint, using the Stream API as often as possible has the added advantage that as long as the Java Stream library has been upgraded, the code can take advantage of the upgrade without making any changes.