The Java language’s garbage collection mechanism, known as GC, is a great tool to improve the efficiency of Java code development, but pauses caused by GC often seriously affect the response time of Java business. The introduction of The millisecond pause ZGC (The Z Garbage Collector) in Java11 has improved Java business response time optimization and is one of The reasons for many businesses to switch from Java8 to Java11.

Alibaba Dragonwell 11, maintained by Alibaba, is a free, production-ready release of OpenJDK 11 that provides long-term support for performance enhancements and security fixes. Back at the first release, Dragonwell 11 provided the experimental ZGC feature of Java 11 by default, in response to the needs of Alibaba’s internal business and customers on the cloud. As ZGC entered production practice and gradually fell into practice, Ali business and customers on the cloud also encountered some practical problems in the process of enjoying the optimization of response time. Because of this, alibaba has released Dragonwell11.0.11.7, in which the ZGC feature is modified from the experimental feature in OpenJDK 11 to production-ready feature, Dragonwell 11 also provides long-term quality stability.

This article starts the technology sharing of Alibaba Dragonwell ZGC series, introducing the ZGC on Alibaba Dragonwell11: This three-part series introduces the basic concepts of GC and the scaling practices of ZGC, introduces the principles and tuning of ZGC in part 2, and introduces Dragonwell’s production-ready transformation of ZGC in part 3.

Java GC profile

Garbage collection GC is the Java language’s automatic memory management mechanism that automatically destroys garbage objects (objects that can no longer be referenced), freeing memory for later use. With the help of GC, Java developers can focus on their business logic and call new statements to create objects instead of writing statements to destroy objects, thus improving code development efficiency and code quality.

GC performance Index

There is no such thing as a free lunch, and GC brings convenience as well as significant side effects. For Java services, users typically care about two metrics: Throughput QPS (Query per second) and response time RT (Response time). GC generally has a negative impact on QPS and RT. GC pauses prolong RTS, especially RTS P99/P999 for long tail requests (RTS ranked 99%/99.9% of requests from fast to slow). In order to ensure the correctness of the collection algorithm, the GC often needs to suspend all Java threads executing the business logic to avoid competing with the Java thread allocating the object. The Java thread is unable to respond to any requests during the pause, resulting in a longer RT for the business.

GC reduces throughput (QPS upper limit). The GC thread consumes additional CPU resources, thus affecting the CPU share of the Java thread. GC is often thought to go hand in hand with pauses. However, this view is not entirely correct. Modern Java’s GC can start concurrent GC threads to execute concurrently with Java threads.

Silky smooth ZGC experience

The Java language provides several TYPES of GC to accommodate different requirements. These GCS have different characteristics in terms of throughput and response time:

  • Parallel GC: High THROUGHPUT, long GC pause time;

  • The G1 GC is the default GC in Java11 (target GC pause time is 200ms), while the CMS GC is deprecated in Java11.

  • Shenandaoh GC: GC has short pause time and average throughput.

The ZGC is a new generation of GC introduced by OpenJDK11, with pause times of up to 10ms and support for terabytes of heaps.

GC prior to Java11 often required pauses of 100ms or more, negatively impacting metrics like RT P99 and making running Java businesses seem to be stumbling along potholes. ZGC pauses at the millisecond level can make RT P99 drop even further, giving the Running Java business a silky experience. Most of the time, ZGC just needs to adjust the size of the heap and the number of concurrent GC threads, so tuning is easy and saves a lot of mental cost.

ZGC practice

This section describes the applicable scenarios of the ZGC before it is used to enable services to be online. On the basis of fully evaluating the business characteristics, we made the corresponding business run on ZGC and achieved RT improvement. However, the ZGC of OpenJDK 11 is still experimental and we have encountered some problems in practice. We documented these issues as risk items and attempted to address them in Dragonwell 11.

ZGC application scenario

ZGC achieves excellent millisecond pause performance, but the side effect is that ZGC can reduce throughput (the ZGC project homepage claims up to 15% throughput loss). The reasons mainly include three aspects:

1.Java11’s ZGC is a single-generation GC, and each ZGC needs to deal with long-lived objects (objects that survive multiple GC rounds), whereas prior to Java11’s GC was generational GC, so there is no need to deal with long-lived objects every GC.

  1. The ZGC needs to enable concurrent GC threads to reduce the CPU usage of Java threads.

3.ZGC’s read barrier (described below) makes each operation to load an object from the heap an additional overhead. In addition, because the ZGC does not support compressed pointer technology, the ZGC cannot enjoy the performance benefits of compressed pointer on small heaps up to 32GB.

Based on the above description of ZGC characteristics, the author summarizes the applicable scenarios of ZGC for the reference of friends who intend to switch to ZGC:

1. Java services with high requirements on indicators such as long tail request RT P99/P999: these services usually require real-time response and are very sensitive to the slowest 1% or 0.1% of requests;

2. The machine is rich in memory and CPU resources: rich computing resources can enable larger heap and more concurrent GC threads;

3. Can tolerate reduced throughput rate: After weighing the business, it is considered that RT P99/P999 index is more important than QPS index;

4. Relatively few long-lived objects: Java11’s ZGC is not yet generational enough to handle such objects efficiently.

In addition, if your Java business is still running on Java 8, you need to consider the cost of switching to Java 11.

ZGC scale practice

In order to break the bottleneck of GC suspension for RT, many Java businesses within Ali have strict requirements for long tail request RT. In order to break the bottleneck of GC suspension for RT, these Java businesses gradually upgrade to Java11 and choose ZGC as GC. The following shows a case of Alibaba’s internal use of ZGC to obtain RT improvement. The Concurrent Mark/Relocate mentioned in this section will be discussed in the second article in this series.

1. High-performance database: Lindorm is the high-performance NoSQL HBase branch of Ali. Lindorm has been running stably on ZGC for nearly two years, and successfully passed the Double Eleven examination during this period. The ZGC pause time for the duration of the Lindorm is around 5ms, and the maximum pause time does not exceed 8ms. ZGC significantly improved RT and burr metrics for online clusters, with an average RT optimization of 15% to 20% and P999 RT reduced to less than half of the original. In 2019 Double 11 ant cluster with ZGC, Lindorm RT P999 time reduced from 12ms to 5ms. The following figure shows the GC pause performance of Lindorm on ZGC in microseconds.

2. Message queue application:RocketMQ supports distributed file systems as storage instead of relying on local file systems to improve elastic scaling. RocketMQ originally used G1GC, but GC pause times of more than 200ms could not be reduced even after extensive tuning. Through research, it is found that the main factor of GC pause is that C language library must be called based on JNA when accessing distributed file system, while JNA relies on Finalizer to recover objects in native memory, and these objects will be recovered after at least 2 G1 GC cycles. A large number of object transfers (about 500,000 per GC) result in long pauses. Due to theZGC recycling native objects is done in the concurrent phaseTo avoid long pauses. RocketMQ simply sets the heap size of the ZGC and the number of concurrent threads to bring the current onlineGC pause times are all less than 2ms, greatly reducing system access burr. The following figure shows RT metrics for RocketMQ using ZGC.

3. Risk control call:Part of the online application is sensitive to risk control call time. These applications set very short service invocation timeouts (< 50ms), while the current risk control system takes around 60ms for a Young GC. Whenever GC is encountered, the business call times out. Take the red envelope business as an example, the red envelope can not be issued after the timeout, or it may be taken away by the wool party, which has an impact on the business. To improve availability, you need ZGC support.The pause time of the actual ZGC running online should be kept within 10ms, which can meet the needs of these RT-sensitive applications. Due to the large number of cache objects in the risk control system, the Concurrent Mark phase takes a long time, affecting the increase of throughput rate. To enable the ZGC to run more smoothly, the risk control system increases the QPS limit by reducing the number of long-lived objects cached.

OpenJDK11 ZGC risks

Early in the process, we adopted OpenJDK11’s ZGC, which was only experimental. Since the release of the experimental ZGC in OpenJDK 11, the stability and functionality of ZGC have been enhanced. By the time OpenJDK 15 was released, ZGC was a production-ready feature. OpenJDK 11 is a long-supported release, while currently released OpenJDK12-16 is not long-supported, so it is difficult to directly deploy OpenJDK 15/16 in production practice to try out a product-ready ZGC.

The above practice shows that for Java heap sizes ranging from 10GB to hundreds of GB, ZGC does keep pauses to 10ms or less. However, all of these businesses reported “poor QPS performance”, meaning that ZGC did not perform well for throughput scenarios. In a throughput scenario, the collection speed of the ZGC can not keep up with the Allocation speed, and an Allocation Stall occurs, which means that the thread that is currently creating objects is suspended and the ZGC is waiting for free space to be released. In addition, Lindorm reports that the overall performance of the ZGC on a small heap is not even as good as that of the G1GC. In addition, the above practice also encountered some problems with the OpenJDK11 experimental ZGC:

1. Crash without warning: Lindorm finds that two variables refer to the same object, but the code detects that the two variables are not equal; The RocketMQ business also found that the program crashed without warning during runtime. Close inspection reveals that the read barrier is separated from the load operation and may enter GC pauses in between. This has been fixed in JDK14.

2. OOM: The risk control business notices that the ZGC may perform an OOM during a Concurrent Relocate. The ZGC typically reserves a space for a Concurrent Relocate, but the JDK11 ZGC code does not guarantee that the space is sufficient, and if the object Relocate is fast, it may throw an OOM. This problem is solved in OpenJDK16.

3.Page Cache Flush affects RT: ZGC divides the heap into three sizes of small/medium/ZPage (objects of different sizes are allocated to different types of ZPage). If the allocation speed of objects of different sizes is not stable (for example, the number of medium size objects suddenly increases, then it is necessary to convert the small/large ZPage into medium ZPage, which takes a long time). Lindorm notes that this phenomenon can seriously affect RT. OpenJDK15’s ZGC mitigates this.

All above are urgent problems to be solved in the production practice of ZGC. Alibaba Dragonwell 11 is the downstream of OpenJDK11 and inherits all features including ZGC. In a later chapter, we’ll share a ZGC production-ready makeover for Dragonwell11. Until then, you can try to start ZGC on Dragonwell11.

ZGC Dragonwell11 open

Java developers need to update the JDK to Alibaba Dragonwell 11.0.11.7 and above. To enable ZGC, simply open -xx :+UseZGC at Java startup. Check out the Dragonwell ZGC tuning options.

  • Dragonwell11 download:

    Github.com/alibaba/dra…

  • Dragonwell ZGC tuning options:

    Github.com/alibaba/dra…

In later chapters, we will introduce the principles and tuning techniques of ZGC, and see that our Alibaba Dragonwell11 solves some of the problems in production practices by adapting the ZGC to production readiness.

About the author Tang Hao, joined ali Cloud programming language and compiler team in 2019, currently engaged in the direction of JVM memory management optimization.

Now DragonWell has joined the OpenAnolis Community (Java language and virtual machine SIG), at the same time, Anolis OS version 8 support DragonWell cloud native Java, welcome to join the community SIG, participate in the community. \

Address: SIG website: openanolis. Cn/SIG/Java/do…

– the –

Join the Dragon lizard community

Join wechat group: Add community assistant – Dragon Community Xiao Long (wechat: Openanolis_ASSIS), note [Dragon] pull you into the group; Join spike group: Scan the QR code of Spike Group below. Welcome developers/users to join the Draganolis community to promote the development of draganolis community and build an active and healthy open source operating system ecosystem together!

About the dragon Lizard community

OpenAnolis is a non-profit open source community composed of enterprises, institutions of higher learning, scientific research units, non-profit organizations and individuals on the basis of voluntarism, equality, open source and collaboration. Dragon Lizard community was founded in September 2020 to build an open source, neutral, open Linux upstream distribution community and innovation platform.

The short-term goal is to develop Anolis OS as an alternative to CentOS and rebuild a distribution compatible with major international Linux vendors. The medium and long term goal is to explore and build a future-oriented operating system, establish a unified open source operating system ecosystem, incubate innovative open source projects, and prosper the open source ecosystem.

Join us to build an open source operating system for the future!

Https://openanolis.cn