This is the sixth day of my participation in the August More text Challenge. For details, see: August More Text Challenge

What is a ZGC

ZGC is a new, experimental, low-latency garbage collector added in JDK11. Currently, it only supports Linux/x86-64. The ZGC collector is a garbage collector based on Region memory layout and (for the time being) non-generational. It uses techniques such as read barriers, dye Pointers, and multiple memory mappings to implement concurrent mark-collation algorithms with low latency as its primary goal. The design objectives include:

The pause time is not more than 10ms;

Pause time does not increase with the size of the heap, or the size of the active object;

Support for heap levels of 8MB to 4TB (16TB in the future).

ZGC principle

The fully concurrent ZGC, similar to ParNew and G1 in CMS, also uses a mark-copy algorithm, but with a major improvement: The labeling, transition, and relocation phases of ZGC are almost all concurrent, which is the most critical reason why ZGC achieves its goal of less than 10ms pause times. ZGC has only three STW phases: initial markup, re-markup, and initial transition. Among them, the initial labeling and initial transfer only need to scan all GC Roots respectively, and the processing time is proportional to the number of GC Roots, which is usually very short. The STW time of the re-marking phase is very short, up to 1ms. If the time exceeds 1ms, it will enter the concurrent marking phase again. That is, ZGC relies on the SIZE of the GC Roots collection alone for almost all pauses, and the pause time does not increase with the size of the heap or the size of the active object. Compared with ZGC, the transition phase of G1 is completely STW, and the pause time increases with the size of the living object.

ZGC core technology

ZGC solves the problem of accurate access to objects and realizes concurrent transfer by using dyeing pointer and read barrier technology. The general principle is described as follows: “concurrent” in concurrent migration means that while the GC thread is moving the object, the application thread is also constantly accessing the object. Suppose the object is moved, but the object address is not updated in time, then the application thread may access the old address, causing an error. In ZGC, when an application thread accesses an object, it triggers a “read barrier.” If the object is moved, the “read barrier” updates the read pointer to the object’s new address, so that the application thread always accesses the object’s new address. So how does the JVM know that an object has been moved? Is to use the address referenced by the object, that is, the dyed pointer. The dyeing pointer and read barrier technical details are described below.

Dyeing pointer

As we all know, the collection process of JVM garbage collector involves marking objects. Only marked objects are alive objects, and unmarked objects are collected in GC. ZGC’s object marking implementation uses the dyed pointer technique. (Whereas traditional GC records tokens in object headers, G1 records tokens in data structures separate from objects —–Rset)

Without further ado, let’s look at a picture:

Among them, 0 to 4TB corresponds to the Java heap, 4TB to 8TB is called M0 address space, 8TB to 12TB is called M1 address space, 12TB to 16TB reserved unused, 16TB to 20TB is called remelementary space.

When an application creates an object, it first requests a virtual address in the heap space, but that virtual address does not map to a real physical address. ZGC also applies a virtual address for the object in M0, M1, and Remapped address Spaces. These three virtual addresses are the same physical address, but only one of these three Spaces is valid at a time. ZGC has three virtual address Spaces because it uses the “space for time” idea to reduce GC pause times. The space in space for Time is virtual space, not real physical space.

Corresponding to the above address space partition, THE ZGC actually uses only bits 041 of the 64-bit address space, while bits 42/45 store metadata, and bits 47 to 63 are fixed to 0. The advantages of a dyed pointer are as follows: Once the living objects of a Region are removed, the Region can be released and reused immediately without waiting for all the references to the Region in the heap to be corrected.

Dyeing Pointers can significantly reduce the amount of memory barriers used during garbage collection. Memory barriers, especially in write barriers, are often set up to record changes in object references, and if this information is maintained directly in Pointers, it obviously eliminates some specialized logging operations.

Dyed Pointers can be used as an extensible storage structure to record more data related to the object marking and relocation process for further performance improvement in the future.

Read barrier

Compared with reading barrier, it is easier to understand. Traditional GC uses writing barrier to solve the problem of missing mark when marking objects. This part involves the knowledge of three-color mark and missing mark. ZGC is the read barrier that is used. We can only determine the reference flag bit of the object and whether the object has been moved before accessing it, without having to complete the GC process, which greatly reduces pause time. The use of read barriers does not affect performance, since the marker of an object can also be determined for an extended period of time with the dyeing pointer technique.

ZGC execution flow

1. Start marking (STW) to find the root node

2. Parallel markup to find garbage

3. Handling edge Cases (STW)

4. The relocation

5. Remapping (this step is not a separate step, part of the remapping is performed with the first step when the next GC starts, and the other part is assisted by the application thread)

The execution process is as follows:

1. During the first ZGC, ZGC will STW and find all root nodes according to the thread stack and constant pool

2. Resume the running of application threads and assign the root node to GC threads. Each thread only marks its own responsible root thread

3. Concurrently look for reachable objects. If a property of the object is a reference type and the pointer shows that it is not yet a start marker, the pointer to that reference type is set to the start marker

4. After marking the completion of the object, the pointer is set to the marked completed state

Forwarding Tables (A’->A); Forwarding Tables (A’->A); Forwarding Tables (A’->A) At this point, the pointer pointing to the object is still in the starting relocation state. If the application accesses the original pointer pointing to A at this point, it will change the address of the pointer to the new address according to the mapping, and change the pointer state to complete the relocation. This is one aspect of the application helping ZGC complete the GC process.

6. After the relocation is complete, the GC process is basically completed.

7. In the second and subsequent GC, when the root node is found to obtain the object according to the pointer, if the pointer is in the state of starting relocation, the pointer will be modified according to the Forwarding Tabels mapping.

Note that the concurrent markup phase is also the object relocation phase of the last GC.

ZGC Important configuration parameters

Example of configuring important parameters:

-Xms10G -Xmx10G

-XX:ReservedCodeCacheSize=256m -XX:InitialCodeCacheSize=256m

-XX:+UnlockExperimentalVMOptions -XX:+UseZGC

-XX:ConcGCThreads=2 -XX:ParallelGCThreads=6

-XX:ZCollectionInterval=120 -XX:ZAllocationSpikeTolerance=5

-XX:+UnlockDiagnosticVMOptions -XX:-ZProactive

-Xlog:safepoint,classhisto*=trace,age*,gc*=info:file=/opt/logs/logs/gc-%t.log:time,tid,tags:filecount=5,filesize=50m

-xms-xmx: Maximum and minimum heap memory, here set to 10GB, the program heap memory will remain 10GB. -xx :ReservedCodeCacheSize -xx :InitialCodeCacheSize: Sets the size of the CodeCache. Jit-compiled code is placed in the CodeCache. 64mb or 128MB is usually sufficient for service. + UnlockExperimentalVMOptions – – XX: XX: + UseZGC: enable ZGC configuration. -xx :ConcGCThreads: concurrent garbage collection threads. The default value is 12.5% of the total number of cores, and the default value for 8-core cpus is 1. The GC becomes faster, but it consumes CPU resources while the program is running, and throughput is affected. -xx :ParallelGCThreads: The number of threads used by the STW phase. The default value is 60% of the total number of cores. -xx :ZCollectionInterval: indicates the minimum interval of ZGC. The unit is second. – XX: ZAllocationSpikeTolerance: ZGC trigger the correction coefficient of adaptive algorithm, the default 2, the larger the value, the earlier the trigger ZGC. + UnlockDiagnosticVMOptions – – XX: XX: – ZProactive: whether to enable active recycling, the default open, here is the configuration of the said shut down. -Xlog: Sets the content, format, location, and size of each GC log.

ZGC trigger

The core feature of ZGC is concurrency, with new objects being generated all the time. ZGC has a variety of GC triggering mechanisms, summarized as follows:

Warm-up rule: This rule occurs when a service is just started and generally does not require attention. The key word in the log is Warmup.

External trigger: Triggered by an explicit call to system.gc () in code. The keyword in the log is system.gc ().

Metadata allocation trigger: Occurs when the metadata area is insufficient and does not need to be concerned. The keyword in the log is Metadata GC Threshold.

Blocking memory allocation request trigger: Some threads will block when the garbage heap is full without time to collect. This kind of triggering should be avoided. The keyword in this log is Allocation Stall.

Allocation rate based adaptive algorithm: the most important GC triggering method. The principle of the algorithm can be simply described as “ZGC calculates the threshold when the memory usage reaches when the next GC is triggered based on the recent object allocation rate and GC time”. Through ZAllocationSpikeTolerance parameters control the size of threshold value, the parameter default 2, the larger the value, the earlier the trigger GC. We solved some problems by adjusting this parameter. The keyword in the log is Allocation Rate.

Fixed interval: Controlled by ZCollectionInterval, suitable for sudden increase of traffic. When the flow changes smoothly, the adaptive algorithm may not trigger GC until the heap utilization reaches 95% or more. When traffic surges, the adaptive algorithm may be triggered too late, causing some threads to block. We can adjust this parameter to solve the problem of traffic surge scenarios, such as timed activity and seckill scenarios. The keyword in the log is Timer.

Active trigger rule: This is similar to the fixed interval rule, but the time interval is not fixed, and is calculated by ZGC itself. Our service has already added a trigger mechanism based on fixed interval, so we can disable this function using the -zproactive parameter to prevent GC from being too frequent and affecting service availability. The keyword in the log is Proactive.

Problems with ZGC

ZGC does not reach production-ready until JDK15, which is not the long-supported JDK version, so it is difficult to deploy to a production environment. ZGC is experimental on JDK11 and only supports X64. If you want to use IT on Arm/Mac/Windows you need to use JDK15.

The following problems can be observed while ZGC is running: Single-generation GC has low throughput: The most significant problem is that Concurrent Mark phases require full heap markup (which takes a long time), resulting in collection speed not keeping up with object allocation speed:

An Allocation Stall occurs, requiring a new ZGC to be started, this time suspending all application threads during the ZGC cycle.

In the worst case, if OOM: Concurrent Relocate phase is performed, OOM will be displayed.

The CPU is high because GC threads are running concurrently.

Because the ZGC uses the Colored Pointer technique, it does not support UseCompressedOops (as opposed to ShenandoahGC), which affects the performance of small heaps (32GB or less). But after JDK15 can support UseCompressedOops closing UseCompressedClassPointers still open, so the defects on the performance to some extent;

In addition to the ZGC pause phase, the object allocation lag is affected by the following factors: Page Cache Flush problem affects the allocation speed: ZGC divides the heap into pages of different sizes (corresponding to G1’s Region) — small/medium/large pages (objects of different sizes are allocated to different types of pages). In this case, we need to convert large/small page to medium page, which is time-consuming.

Only a single Medium Page: In the case of a large number of application threads, if multiple threads simultaneously allocate medium size objects and the idle size of the current Medium Page is insufficient, then it will request to allocate a new Medium Page at the same time. Undo redundant allocation will delay allocation. It may also cause Page Cache Flush as described above.

RSS is particularly high, up to 3 times Xmx, due to ZGC’s multi-mapping mechanism.

conclusion

After this study of ZGC, I have a deeper understanding of the staining pointer and reading barrier of ZGC, and I have more confidence in the use of ZGC in production environment. Only with a clear understanding of its principle and composition can I be assured to use IT.