Use JMH for Java microbenchmarks

In Java programming, we have many ways to write the details of some code calls, but when we are not sure of their performance, we tend to repeat the count. However, as the JVM continues to evolve, as the number of code executions increases, the JVM will continue to compile and optimize, making the number of repetitions to get a stable test result questionable. At this time, experienced students will loop tens of thousands of times before the test execution and comment for warm-up.

That’s right! It’s true that you can get a biased test result, but what if every time you need to consider performance, you have to write a piece of warm-up logic for the scenario? Once the warm-up is complete, how many iterations will it take to measure the formal content? Does the output report of each test result need to be output with system.out?

JMH (the Java Microbenchmark Harness) could be used as part of Java9, but instead of waiting for Java9, we can easily use it to simplify our testing, take care of JVM warm-up, code optimization, Make your testing process much easier.

start

First add dependencies to the project. The latest versions of jMH-core and JMH-Generator-AnnProcess dependencies can be found in the Maven repository.

<dependency> <groupId> org.openJdk.jmh </groupId> <artifactId> jMh-core </artifactId> <version>1.19</version> </dependency> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-generator-annprocess</artifactId> The < version > 1.19 < / version > < / dependency >Copy the code

Create a Helloworld class with an empty method m() annotated with @benchmark, declare this method to be a microbenchmark method, and JMH will generate the Benchmark code at compile time and run it.

public class Helloworld {

    @Benchmark
    public void m() {

    }
}Copy the code

A main entry is added to start the test.

public class HelloworldRunner { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include("Helloworld ") .exclude("Pref ") .warmupIterations(10) .measurementIterations(10) .forks(3) .build(); new Runner(opt).run(); }}Copy the code

A quick introduction to HelloworldRunner, which is an entry and completes the configuration of the JMH test. By default, JMH looks for @benchmark methods and may run tests that you don’t need, so you need to include and exclude the semantics.

WarmupIterations (10) means 10 rounds of warmupIterations, and measurementIterations(10) represents 10 rounds of formal measurement tests. Each time, you go through 10 rounds of measurementIterations as the warmupIterations and then the measurementIterations. The content is to call the @benchmark code.

Forks (3) refers to three rounds of testing. Since one test could not effectively represent the results, comprehensive testing was carried out in three rounds, and each round was preheated and then formally measured.

We run HelloworldRunner, and after a while, the test results are as follows:

Result "com.alibaba.microbenchmark.test.Helloworld.m ": Ops /s [Average] (min, avg, Max) = (2951123277.601, 3084697483.521) 3121456015.904), stdev = 40557407.239 CI (99.9%): [3057600556.875, 3111794410.166](assuming Normal distribution) # launch complete. 00:01:02 Benchmark Mode Cnt Score Error Units Helloworld. M THRPT 30 3084697483.521 ± 27096926.646 OPS /sCopy the code

You can see the score is 3 billion, but what does that 3 billion mean? If you look closely at the Mode item, the type is THRPT, which is actually Throughput. It represents the number of completions per second.

Test type

The type of test mentioned earlier is throughput, which is the number of calls completed in a second, but what if you want to know how long it takes to do one?

That’s what 1 / throughput is

JMH provides the following types of support:

type describe
Throughput The number of executions per period, usually in seconds
AverageTime Average time: Average time spent for each operation
SampleTime The time at which sampling is performed randomly during testing
SingleShotTime Calculate the elapsed time during each execution
All As the name implies, all modes, this is commonly used in internal testing

Using these modes is as simple as adding @benchmarkMode annotations, such as:

@Benchmark
@BenchmarkMode({Mode.Throughput, Mode.SingleShotTime})
public void m() {

}Copy the code

Allocation strategy

JMH supports configuration via @fork annotations, such as:

@Benchmark
@Fork(value = 1, warmups = 2)
@BenchmarkMode(Mode.Throughput)
public void init() {

}Copy the code

In the init() test, there are two rounds of preheating and one round of metering. However, if there are many test methods, you are advised to configure them using Options. For details, see HelloworldRunner.

Example: Circular microbenchmarks

For (int I = length; for (int I = length; i > 0; For (int I = 0; i < length; I++), a little confused. After consulting Wen Shao, wen Shao gave the answer that I > 0 is better than I < length, so reverse order has an advantage, so we do a benchmark test for this scene.

The first is a forward loop with a million iterations.

public class CountPerf {

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void count() {
        for (int i = 0; i < 1_000_000; i++) {

        }
    }
}Copy the code

And then the reverse loop, again 1 million times.

public class CountPerf {

    @Benchmark
    @BenchmarkMode(Mode.Throughput)
    public void count() {
        for (int i = 1_000_000; i > 0; i--) {

        }
    }
}Copy the code

Finally, there is a test entrance. We adopt 3 groups, with 10 rounds of preheating for each group and 10 rounds of formal measurement. The test type is throughput.

public class BenchmarkRunner { public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include("Perf ") .exclude("Helloworld ") .warmupIterations(10) .measurementIterations(10) .forks(3) .build(); new Runner(opt).run(); }}Copy the code

The test results are as follows. It can be seen from the data that the reverse order is superior to the positive order at macro level.

Result "com.alibaba.microbenchmark.forward.CountPerf.count ": Ops /s [Average] (min, avg, Max) = (2586477493.002, 3017436523.994, 3090537220.013), stdev = 111816548.191 CI (99.9%): [2942730446.601, 3092142601.387](assuming Normal Distribution) # launch complete. 00:02:05 Benchmark Mode Cnt Score Error Units C.A.M.B ackward. CountPerf. Count THRPT 30 + / - 30858669.885 3070589161.097 Ops/s C.A.M.F orward. CountPerf. Count THRPT 30 3017436523.994 + / - 74706077.393 ops/sCopy the code

Optimized Hessian2 microbenchmark

By default, HSF uses Hessian2 for serialized transmission, and Hessian2 carries type meta information with it each time it is transmitted, which will incur some overhead on resources in real scenarios. HSF2.2 will use optimized Hessian2 for serialization, which differs from Hessian2 in that it caches meta information based on long connection level and only sends data content each time. Since only data content is sent, the resource cost is lower. We have done benchmark tests on Hessian2 and optimized Hssian2. The results are as follows:

Benchmark Mode Cnt Score Error Units C.A.M.H.H essian. DeserialPerf. Deserial THRPT 60 147255.638 + / - 1057.106 ops/s C.A.M.H.H essian. SerialPerf. Serial THRPT 60 146336.439 + / - 1199.087 ops/s C.A.M.H.O ptihessian. DeserialPerf. Deserial THRPT 60 327482.489 + / - 3366.174 ops/s C.A.M.H.O ptihessian. SerialPerf. Serial THRPT 60 176988.488 + / - 1233.302 ops/sCopy the code

The optimized Hessian outperforms Hessian2 in serialization throughput at 17W/s, while deserialization unexpectedly exceeds Hessian2 twice at 32W/s.

reference

  • Microbenchmarking with Java
  • JMH Samples