The Z Garbage Collector Algorithm

introduce

ZGC was first released as a preview feature in JDK11, and was released with JDK 15 on September 15 last year.

ZGC is a scalable low-latency garbage collector with a maximum GC pause time of 10 milliseconds, capable of handling heaps from megabytes to terabytes, and a maximum throughput reduction of 15%.

JVM garbage collector

So far, the JVM has introduced a number of interesting garbage collection algorithms. Some of the most important garbage collection algorithms are listed below:

  • Serial (low memory footprint) : Works with a single thread, is suitable for single-processor computers, and is optimized for low memory (embedded systems).
  • Parallel (throughput collector) : Secondary collections are performed in parallel to reduce the overhead of garbage collection. Suitable for medium to large data set applications running on multiprocessor hardware.
  • CMS (Concurrent mark scan collector) : has a short garbage collector pause time. Designed for applications with a large number of long-lived objects or a large lifetime.
  • G1 (Throughput/wait time Balance) : Garbage-First is a server-style Garbage collector for multiprocessor computers with large memory. It attempts to meet the GC pause time target with high probability while achieving high throughput. Full heap operations, such as global tags, are executed simultaneously with the application thread. This prevents interrupts that are proportional to the heap or active data size.
  • ZGC (Low delay)

Serial and parallel are called stop of the world algorithms. CMS was deprecated in JDK 9 and replaced with G1.

It is important to note that the ZGC can handle all the heavy lifting in parallel in a way that other algorithms cannot (see below for details).


Learn more about ZGC

ZGC is a concurrent low-latency algorithm in which all other operations (marking, compression, reference processing, relocation set selection, StringTable cleaning, JNI WeakRef cleaning, JNI GlobalRefs scanning, and class offloading) are parallel except for thread stack scanning. So this algorithm is really useful for low latency.

ZGC pause times do not increase with heap or activity size, but rather with the size of a subset of the root set (the number of Java threads your application is using). That is, The thread stack is still scanned in The stop-the-world phase. But starting with JDK 16, scanning the thread stack is handled in parallel, meaning that applications can run simultaneously while scanning the stack.

The original author wrote in JDK15, and I have added some changes in JDK 16 to ZGC

From an algorithmic point of view, it is a concurrent collector that does all the heavy lifting while the Java thread continues to execute. It is a region-based collector, which means that the heap is divided into smaller regions, and compression is focused on a subset of these regions, those with the most garbage. It is NUMA aware and reduces latency because the CPU has local memory. It uses color Pointers and load barriers, which are described in more detail in the following sections. And it’s a single generation collector, it doesn’t have the younger or older generations of previous collection mechanisms.

ZGC phase

The ZGC GC cycle is divided into three phases.

In the first phase (pause marking begins), the ZGC traverses the object graph to mark the object as active or useless. This phase also includes remapping the live data.

The second stage is “end of pause mark”, during which reference pretreatment is completed. The selection of the class unload and relocation set is also completed in this phase.

The pause relocation start is the final phase, where a lot of heap compression is done.

Color pointer

This is the core design concept in the ZGC. The algorithm uses unused bits of a 64-bit object pointer to store metadata that can be found, tagged, located, and remapped. The following figure shows the 64-bit object pointer and the meaning of each bit.

The memory barrier

It is code injected by the JAVA just-in-time compiler at some important point. The purpose is to check if the loaded object reference has a bad color. Load barrier code is run when the thread loads the object reference from the heap.

Tuning options

From JDK 11 to JDK 15 releases, if you want to use the ZGC algorithm, you must unlock the experiment option:

-XX:+UnlockExperimentalVMOptions -XX:+UseZGC
Copy the code

If it is JDK 15 or later, just specify the following:

-XX:+UseZGC
Copy the code

The ZGC design is easy to adjust. Here is a list of specific ZGC options:

To know the usage time and see some numbers about the behavior of the algorithm, we can print some garbage collector logs. When selecting ZGC to view simple logs, simply add the following command:

-XX:+UseZGC -Xmx<size> -Xlog:gc
Copy the code

If you want to print a garbage collector log with more details, do the following:

-XX:+UseZGC -Xmx<size> -Xlog:gc*
Copy the code

Let’s look at some other interesting tuning options.

Set heap size

One of the most important tuning options in ZGC is to set the maximum heap size (-xmx

). We have to find the right value for our application because we don’t want to lose memory and we want to allow our application enough space for live objects and allocation when GC runs. The following is an example:

-XX:+UseZGC Xmx<size> 
Copy the code

Set up concurrent GC threads

Although the ZGC has a heuristic that automatically selects this number, it might sometimes be interesting to specify the number of concurrent GC threads, depending on our application. This option determines how much CPU the GC will take up, so you must be careful about the capacity you provide.

Return unused memory to the operating system

Unlike other GC algorithms, ZGC uncommits unused memory and returns it to the operating system. This may be necessary for applications that are likely to hog memory. To disable this option, use -xx: -zuncommit.

-XX:+UseZGC -Xmx<size> -XX:-ZUncommit
Copy the code

Enable large pages on Linux

This option improves performance without any pitfalls. The only problem is that it requires root authorization, which is why it is not the default option and may not be enabled for your application. Review the documentation to set this option correctly. It needs to prepare a few things, the options are as follows:

-XX:+UseZGC -Xms16G -Xmx16G -XX:+UseLargePages
Copy the code

Enable large transparent pages on Linux

The Huges page is not recommended for use in delay-sensitive applications, although it can replace the previous tuning option.

- XX: + UseZGC -... -XX:+UseLargePages -XX:+UseTransparentHugePagesCopy the code

In this case, I strongly recommend that you experiment in your application and watch out for peaks, which you may not be able to choose if they occur.

Enable NUMA support

ZGC enables NUMA support by default. This directs the Java heap allocation to NUMA local memory. The JVM can disable it automatically, using the options -xx: + UseNUMA or -xx: -usenuma if you need explicit override behavior.

-XX:+UseZGC -Xmx<size> -XX:+UseNUMA
Copy the code

or

-XX:+UseZGC -Xmx<size> -XX:-UseNUMA
Copy the code

There is more detailed information on the algorithm Wiki page.