Previous article

(1) The data region of the JVM runtime

2, The JVM (ii), Java object creation process

(3) Allocating memory for object creation

JVM garbage collection algorithm

Introduction to garbage collector

JVM (vi) garbage collector CMS

JVM (7) garbage collector G1

The previous article looked at how the G1 works, and in this article we’ll start talking about the ZGC

A list,

ZGC is a low-latency Garbage Collector that has been added to the Garbage Collector family in JDK11. In this release, it is experimental. If you want to use it in production, a later VERSION of the JDK is recommended.

Two, the working principle

Like ParNew and G1 in CMS, ZGC uses a mark-copy algorithm, but with a major improvement: ZGC is almost always concurrent at the mark, transfer, and relocation phases, which is key to its extremely short pause times.

The ZGC garbage collection cycle is shown below:

  • Concurrent Mark: Like G1, concurrent marking is the stage of traversing the object graph for reacability analysis, followed by short pauses similar to G1’s initial and final marks (but with different names in ZGC). The Marked 0 and Marked 1 flag bits in the dye pointer are updated throughout the marking phase. The pause time has nothing to do with heap size, only the number of GC Roots. Summary: The concurrent marking phase has two transient STWS. The ZGC has only a transient STW, and most of the process is executed concurrently with the application thread, such as the most time-consuming concurrent markup and concurrent movement process.

  • Concurrent Prepare for Relocate: This stage will determine which regions are to be relocated based on specific query conditions. ZGC scans all regions with each collection, trading the cost of a wider scan for the savings in G1 memory set maintenance. The ZGC redistribution set simply determines that the surviving objects in it will be copied to another Region. Not for profit recovery. The uninstallation of classes and the handling of weak references that are initially supported in the ZGC of JDK12 is also done in this phase.

  • Concurrent Relocate: Redistribution is the core stage of the ZGC. This process copies the surviving objects in the redistribution set to the new Region and maintains a Forward Table for each Region in the redistribution set to record the direction relationship from the old objects to the new objects. ZGC collector can only from the reference on clear whether an object in the redistribution of set, if the user thread as concurrent access to the concentration of the redistribution of object, the visit will be intercepted by the preset memory barrier, then immediately turn according to the Region of the published records to forward the access to the new copy of objects, and revised to update the reference value at the same time, Making it point directly to the new object is what ZGC calls the “self-healing” ability of Pointers. ZGC’s dyeing pointer slows down only the first time it accesses old objects because of its self-healing ability, whereas Shenandoah’s Brooks forwarding pointer slows down every time it accesses old objects. Once all the living objects of a Region in the redistribution set have been copied, the Region can be immediately released for new object allocation, but the transfer table must be left unreleased because there may still be access to the transfer table.

For example, because the GC thread and the application thread execute concurrently during marking and moving, this situation exists: A reference inside object A to object B is marking or moving state. To ensure that the application thread gets the right object B, the Pointers to B are read through A load barrier that guarantees the correct reading of data during GC.

  • Concurrent Remap: What the Remap does is fix all the references in the heap to old objects in the reallocation set, but the ZGC has self-healing for object references, so this Remap operation is not urgent. The ZGC cleverly merges the work of the concurrent remapping phase into the concurrent marking phase of the next garbage collection cycle, which will traverse all objects anyway, thus saving the overhead of traversing the object graph once.

To summarize, the STW time period is the initial marker, then the marker, and then the initial transition (the one I’ve highlighted in ugly font). Initial labeling and initial transfer only need to scan all GC Roots respectively, and the processing time is proportional to the number of GC Roots. Generally, the processing time is very short. The STW time of re-marking stage is very short, 1ms at most, and the concurrent marking stage will be entered again if the time exceeds 1ms. That is, almost all of the PAUSES in ZGC depend only on the GC Roots set size, and the pause time does not increase with the heap size or the size of the active object. Compared with ZGC, G1’s transition phase is completely STW, and the pause time increases with the increase of the size of the surviving object.

Key technologies

1. Dyeing pointer and LVB

These two techniques solve the problem of accurately accessing objects during concurrent migration.

In case you get confused, let me explain the object access problem during a concurrent transfer. "concurrency" in a concurrent transfer means that the application thread is constantly accessing the object while the GC thread is transferring the object. If the object is moved and the object address is not updated in time, the application thread may access the old address, causing an errorCopy the code

Coloring Pointers are a technique for storing information in Pointers, whereas coloring Pointers in ZGC refer to storing the state of an object in Pointers. ZGC divides the 64-bit virtual address space into multiple subspaces (which is why it only supports 6-bit systems), as shown below:

The markup information is stored in the pointer to the reference object: the upper 18 bits are unused, the lower 4 bits store the markup information, the lower 42 bits store the address of the object, and the most important 4 bits store the markup information.

  • Marked0 /Marked1: mark bit, mark object is available,
  • Remapped: Indicates whether the object is in the reassignment set.
  • Finalizable: Whether markup objects can only be accessed through fnalize()

Use two marker bits, Marked0 and Marked1: use them alternately in different recycling cycles. The marker bit of the previous recycling cycle becomes invalid in this cycle and is reset to 0. For example, if Marked0 is used in one cycle and the living object is marked as 01, the marker bit Marked1 is used in the next cycle and the living object is marked as 10.

LVB is essentially a read barrier, which you can think of as AOP before reading an object, that is, before reading an object does something else. To quote rbig, in ZGC, LVB does all sorts of things: during the marking phase it marks the pointer and “fixes” the pointer in the heap to the value after the new marking; In the move object phase, the barrier updates the read pointer to the object’s new address and “fixes” the pointer in the heap to the original field. This way, even if the GC moves the object, the read barrier will find and correct the pointer, and the application code will always hold the updated valid pointer, without the need for stop-the-world, the coarse-grained synchronization between the GC and the application, which is known as self-healing.

2. Support NUMA — Aware

Non-uniform Memory Access (NUMA) is a Memory architecture designed for multi-processor or multi-core processor computers.

Today’s multi-cpu slot servers are all Numa architectures, such as a server with two CPU slots (24 cores) and 64 GIGABytes of memory, so the 12 cores on one CPU can access its 32 GIGABytes of local memory much faster than the other 32 GIGABytes of remote memory. The ZGC supports the NUMA architecture by default. When creating an object, the memory closest to the CPU on which the thread is executing is allocated first, which can significantly improve performance.

Four, again talk about partition and generation

1. Partitioning technology in ZGC

ZGC also uses partitioning, which differs from G1 in that it is divided into the following types (in ZGC terminology, this is called Page) according to size:

  • Small region: fixed value: 2 MB, used to store small objects smaller than 256KB.
  • Medium region: 32 MB in size, used to store objects larger than 256KB and smaller than 4 MB.
  • Large region: the size of a region can be variable and the value must be a multiple of 2M. It stores objects larger than or equal to 4M. The minimum size of a large region is 4M. Large regions do not perform redistribution because it is expensive to copy large objects.

2. Generation technology and partition technology

I won’t go into the details of partitioning and generation. If you don’t understand, you can read the previous article. Partitioning and generational are not at all in conflict, which is what many people don’t understand. In fact, they are two completely different things, but the purpose is the same, to avoid the increased latency and reduced throughput caused by scanning too much memory at one time. Then why did ZGC not divide the generation? Through checking various materials, I found that ZGC was also planning the generation, so it was purely due to the designer’s time problem.

5. Common parameters