Generational collection algorithm

The design principle of the generational collection algorithm is that the collector should divide the Java heap into different regions and then allocate the collection object to different regions for storage based on its age (age is the number of times the object has been through the garbage collection process). If most of the objects in an area are too ephemeral to survive the garbage collection process, a large amount of space can be collected at a lower cost by lumping them together and focusing on keeping a small number alive at each collection rather than marking the large number of objects to be collected. If the rest of the objects are difficult to die, then by lumping them together, the virtual machine can reclaim the area less frequently, both in terms of the time cost of garbage collection and the efficient use of memory space.

Mark-clear algorithm

The algorithm is divided into two stages: “marking” and “clearing” : first, all the objects to be reclaimed are marked out. After the marking is completed, all the marked objects are reclaimed in a unified manner. Conversely, all the unmarked objects can be reclaimed in a unified manner by marking the surviving objects. The marking process is the process of determining whether an object belongs to garbage. As shown in the figure below:

Mark-copy algorithm

The mark-copy algorithm is often referred to simply as the copy algorithm. In 1969 Fenichel proposed a garbage collection algorithm called Semispace Copying. It divides available memory into two equally sized chunks by capacity and uses only one chunk at a time. When a piece of memory is used up, the remaining objects are copied to another piece, and then the used memory space is cleaned up at once. If most objects are live in the memory, the algorithm will produce a lot of memory copy overhead, but in the case of most objects are recycled, algorithm need to copy is in the minority living objects, but every time is for the entire area in memory, when allocating memory, also need not consider to have the complex situation of space debris, as long as the mobile heap pointer, Assign them in order. This implementation is simple and efficient, but its drawbacks are also obvious. The cost of this replication and recycling algorithm is to reduce the available memory to half of the original, there is a lot of space waste. The execution of the mark-copy algorithm is shown in the figure.

In most Java virtual machines, this collection algorithm is preferently used to reclaim the new generation. The new generation is divided into a large Eden space and two small Survivor Spaces, and only Eden and one Survivor space are used in each allocation. When garbage collection occurs, Eden and all surviving objects in Survivor are copied to another Survivor space, and Eden and the used Survivor space are cleaned up.

The default size ratio of The HotSpot virtual machine to Eden and Survivor is 8∶1, which means that the available memory space in each generation is 90% of the entire generation capacity (80% of Eden plus 10% of a Survivor), and only one Survivor space is “wasted”. Of course, 98% of objects that can be collected are only measured in the “normal scenario,” and there is no way for anyone to guarantee that no more than 10% of objects will survive each collection. When Survivor space is insufficient to hold objects that survive after a Minor GC, You need to rely on other memory regions (usually older ones) for Handle Promotion.

If the other Survivor space does not have enough space to hold the surviving objects from the last generation collection, these objects will go directly to the old generation through the allocation guarantee mechanism, which is safe for the virtual machine.

Mark-collation algorithm

When the object survival rate is high, the mark-copy algorithm needs to perform more copying operations, and the efficiency will be reduced. If 50% of the space needs to be wasted, there needs to be extra space for allocation guarantee to deal with the extreme case that all objects in the used memory are 100% alive, so this algorithm can not be used directly in the old era. In 1974, Edward Lueders proposed another targeted “mark-compact” algorithm, in which the marking process is still the same as the “mark-clean” algorithm, but the subsequent steps are not directly to clean up the recyclable objects. Instead, all the living objects move toward one end of the memory space, and then directly clean up the memory outside the boundary. A schematic diagram of the mark-tidy algorithm is shown in the figure. The essential difference between mark-clean algorithm and mark-clean algorithm lies in that the former is a non-mobile recycling algorithm, while the latter is mobile. Whether or not to move recycled objects is a risky decision with both advantages and disadvantages:

If mobile live objects, especially has a large number of objects in every time the recycling old s live area, mobile live objects and update all the references to these objects will be an extremely load operation, and the objects move operation must suspend the user application to all the way, it’s more for the users to have to carefully weigh its disadvantages, A pause like this was vividly described by The original virtual machine designer as “Stop The World.”

However, if the mark-clean algorithm does not consider moving and sorting the living objects at all, the problem of spatial fragmentation caused by the living objects diffuse in the heap can only be solved by more complex memory allocators and memory access. For example, through the “partition free allocation list” to solve the memory allocation problem (computer hard disk storage large files do not require physical continuous disk space, can be stored and access on the fragmentation of the hard disk is achieved through the disk partition table). Memory access is one of the most frequent operations of a user program, and adding additional burdens to this segment will have a direct impact on the throughput of the application.

Based on the above two points, whether to move objects or not has disadvantages. Moving objects makes memory collection more complicated, and not moving objects makes memory allocation more complicated. In terms of pause time for garbage collection, not moving the object is shorter or even unnecessary, but in terms of overall program throughput, moving the object is more cost-effective. In this context, throughput is essentially a Mutator, which can be understood as a user program that uses garbage collection. Even if not moving objects makes the collector more efficient, the total throughput is still reduced due to the time spent in allocating and accessing memory, which is much more frequent than garbage collection. The Throughput focused Parallel Scavenge collector in the HotSpot virtual machine is based on the mark-tidy algorithm, while the latency focused CMS collector is based on the mark-sweep algorithm.

In addition, there is a “muddle” solution can not on the memory allocation and access to add an extra burden is too big, it is making the virtual machine at ordinary times, using the algorithm of mark – clear most of the time temporarily tolerate the existence of memory fragments, until the memory space fragmentation degree is big enough to affect object allocation, the tag – sorting algorithm to collect it again, To get a regular memory space. This is what the mark-clean algorithm-based CMS collector mentioned earlier does when faced with too much space debris.

HotSpot collection algorithm implementation

Root node enumeration

Reference counting algorithm and reachability analysis algorithm are used to determine whether an object is reclaimed. In the reachability analysis algorithm, the root node of GC Roots is used as the starting point to search down the reference chain. If the reference chain is not found, the object is determined to be recyclable.

Objects that can be used as GC Roots are in the context of global references (such as constants, class-static attributes) and execution contexts (such as local variable tables in stack frames). Many applications today have hundreds of megabits in the method area alone, and it is obviously time consuming to check each reference.

In the HotSpot implementation, space is traded for time, using a set of OopMap data structures. After class loading, what type of data is calculated at what offset in the object. During JIT compilation, OopMap is used to record which stacks and registers are references at specific locations (i.e. safe points) so that GC does not have to scan all of them.

safer

It would take a lot of extra space to generate OopMap for every instruction. At this point, The program does not Stop The World at all places to start GC, but only when The safe point is reached. The safety point should not be selected to make GC wait too long or to trigger GC frequently, such as instructions with method calls, loop jumps, and exception jumps.

Another problem is how to get all the threads to come to a halt near a safe point when GC occurs. There are two options:

  • Preemptive interrupt, when GC occurs, first pauses all threads and then allows those threads that have not reached the safe point to resume and run to the safe point. Few virtual machines respond to GC in this way.
  • Active interrupt. When GC occurs and all threads need to be interrupted, an interrupt flag is set instead of directly operating on the thread. This flag coincides with the safe point, and each thread polls for this interrupt flag when executing.

The safety area

The runtime safety point is a perfect solution to the problem of GC triggering, but when the program is not executing, that is, no CPU time has been allocated, the thread cannot respond to the JVM’s interrupt request, and the JVM obviously cannot wait until the CPU time allocated has reached the safe point before GC starts. This is the need for safe areas to solve the problem.

A secure zone is a section of code where references do not change and where IT is safe to start GC. When a thread executes a block of code in the security zone, it identifies itself as being in the security zone. At this point, if the JVM initiates a GC thread, it will no longer flag the interrupt status identifier. When the thread leaves the security zone, it will check whether the GC enumeration of the ROOT node (or the entire GC process) has completed. If not, wait until the GC completes the collection task and receives a signal to leave before leaving the safe zone.

Memory sets and card tables

To address the problem of cross-generational references to objects, the garbage collector created a new generation data structure called a Remembered Set to avoid adding the entire old generation to the GC Roots scan. As a matter of fact, this problem is not limited to the new generation and the old generation. All garbage collectors that involve Partial GC behavior, such as G1, ZGC, and Shenandoah collectors typically, face the same problem. Therefore, it is necessary to clarify the principle and implementation of memory sets. This will help you understand better when you introduce some of the latest collectors in subsequent chapters.

A memory set is an abstract data structure used to record a collection of Pointers from a non-collection area to a collection area. If we ignore efficiency and cost, the simplest implementation can implement this data structure with an array of all objects containing cross-generational references in the non-collection area, as shown in the code

Class RememberedSet {
	Object[] set[OBJECT_INTERGENERATIONAL_REFERENCE_SIZE]; 
}
Copy the code

This kind of implementation, which records all referenced objects across generations, is quite expensive in terms of both space and maintenance. In a garbage collection scenario, the collector only needs to determine whether there is a pointer pointing to the collection area in a non-collection area through the memory set. It does not need to know all the details of these intergenerational Pointers. When implementing a memory set, the designer can choose more robust recording granularity to save on memory set storage and maintenance costs. Here are some of the available (but also outside of this range) recording precision options:

  • Word-length accuracy: Each record is accurate to the length of a machine word (that is, the number of addressing bits in the processor, such as the common 32 or 64 bits, which determines the length of the pointer to the physical memory address that the machine accesses), which contains cross-generation Pointers.
  • Object precision: Each record is precise to an object that has fields containing cross-generational Pointers.
  • Card accuracy: each record is accurate to an area of memory where objects contain cross-generational Pointers.

Among them, the third “Card precision” refers to the use of a called “Card Table” (Card Table) way to achieve memory set, which is also the most commonly used form of memory set implementation, some data even directly confused it and memory set. As mentioned in the previous definition, a memory set is actually an “abstract” data structure. Abstract means that only the behavior intention of the memory set is defined, but the concrete implementation of its behavior is not defined. Card table is a specific implementation of memory set. It defines the recording precision of memory set and the mapping relationship between it and heap memory. For the relationship between card tables and memory sets, readers may wish to follow the Java language’s analogy of the relationship between a HashMap and a Map.

The simplest form of a card table can be just an array of bytes, which is exactly what the HotSpot virtual machine does. The following line of code is the default card table markup logic for HotSpot:

CARD_TABLE[this address >> 9] = 0;
Copy the code

Each element of the byte array CARD_TABLE corresponds to a memory block of a specific size in the memory region identified by it. This memory block is called a Card Page. Generally, the card page size is in bytes to the power of 2 to the N. From the above code you can see that the card page used in HotSpot is in bytes to the power of 2 to the power of 9, which is 512 bytes (address shift 9 bits right, equivalent to dividing the address by 512). If the starting address of the memory area of the card table is 0x0000, the elements 0, 1, and 2 of the array CARD_TABLE correspond to the card page memory blocks with addresses ranging from 0x0000 to 0x01FF, 0x0200 to 0x03FF, and 0x0400 to 0x05FF, respectively. Diagram of card table and card page:

The memory of a card page usually contains more than one object. As long as one (or more) object fields in the card page have cross-generation Pointers, the array element of the corresponding card table is identified as 1 and the element is called Dirty. If not, the element is identified as 0. As garbage collection occurs, by sifting out the dirty elements in the card table, it is easy to figure out which card memory blocks contain cross-generational Pointers and add them to the GC Roots to scan together.

Write barriers

The HotSpot virtual machine maintains card table state through a Write Barrier technique, which can be seen as an AOP aspect of assigning a reference type field at the virtual machine level. When assigning a reference object, an Around notification is generated for the program to perform additional actions. This means that both the assignment and the assignment are covered by the write barrier. The part before assignment is called a pre-write Barrier, and the part after assignment is called a post-write Barrier. Write barriers are used in many collectors of the HotSpot virtual machine, but until the G1 collector, only post-write barriers were used in other collectors.

Once the write barrier is applied, the VIRTUAL machine generates instructions for all assignment operations. Once the collector adds a card table update operation to the write barrier, there is an additional overhead for updating the reference every time, regardless of whether it is an older reference to a newer object. However, this overhead is much lower than the cost of scanning the entire old age in Minor GC.

In addition to the overhead of write barriers, card tables face the problem of “False Sharing” in high-concurrency scenarios. Pseudosharing is a common concern when dealing with the underlying details of concurrency. Modern CPU Cache systems store variables in units of Cache lines. When multiple threads modify independent variables, if they happen to share the same Cache Line, It will affect each other (write back, invalidate, or synchronize) and result in performance degradation. This is a pseudo sharing problem.

if (CARD_TABLE [this address >> 9] ! = 0) CARD_TABLE [this address >> 9] = 0;Copy the code

After JDK 7, the HotSpot virtual machine added a new parameter -xx: +UseCondCardMark to determine whether to enable the condition of card table update. Opening will increase the overhead of an additional judgment, but can avoid the pseudo sharing problem, both have performance loss, whether to open according to the actual operation of the application to test the tradeoff.

Reachability analysis of concurrency

The garbage collectors of mainstream programming languages basically rely on the reachedness analysis algorithm to determine whether the object is alive or not. The reachedness analysis algorithm theoretically requires the whole process to be analyzed based on a snapshot that can guarantee the consistency, which means that the operation of user threads must be frozen in the whole process. In the case of root node enumeration, since GC Roots are still very few objects compared to the entire Java heap, and thanks to various optimization techniques such as OopMap, the pauses are very short and relatively constant (not growing with heap capacity). You can iterate through the object graph from GC Roots, and the pause time at this step must be directly proportional to the size of the Java heap: the bigger the heap, the more objects stored, the more complex the object graph structure, and the longer the pause time to mark more objects, which sounds like a natural thing to do.

To know contains “tag” phase is a common characteristic of all garbage collection algorithms track type, if this phase will increase as the pile of proportion bigger and pause time, its impact will spread to almost all of the garbage collector, similarly, if you can cut this part of the pause time, the revenue will be systematic. To resolve or reduce user thread pauses, it is important to understand why traversing an object graph must be done on a snapshot that guarantees consistency. In order to explain this problem clearly, we introduce tri-color Marking as a tool to assist derivation, Marking the objects encountered in the process of traversing the object graph into the following three colors according to the condition of “whether they have been accessed” :

  • White: indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the reachability analysis, all the objects are white. If the objects are still white at the end of the analysis, it means unreachable.
  • Black: indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned. The black object indicates that it has been scanned. It is safe to live. If there is another object reference to the black object, it does not need to be scanned again. It is not possible for a black object to directly point to a white object (without passing a gray object).
  • Gray: Indicates that the object has been accessed by the garbage collector, but there is at least one reference on the object that has not been scanned.

With regard to the scanning process of the accessibility analysis, the reader can imagine it as a gray wave on the object graph moving from black to white. If the user thread were frozen and only the collector thread was working, there would be no problem. But what if the user thread is working concurrently with the collector? The collector marks the colors on the object graph while the user thread modifies the reference-that is, modifies the structure of the object graph, which can have two consequences. One is to erroneously mark a dead object as alive, which is not a good thing, but is actually tolerable because it generates a little floating garbage that escapes this collection and can be cleaned up in the next collection. The other is to mismark a living object as dead. This is a very fatal consequence, and the program is sure to make an error. The following shows how such a fatal error can occur.

In 1994, Wilson theoretically proved that the problem of “object disappearance” arises if and only if the following two conditions are met simultaneously, that is, an object that should have been black is mislabeled as white:

  • The assigner inserts one or more new references from the black object to the white object;
  • The assigner removes all direct or indirect references from the gray object to the white object.

Therefore, we only need to break either of these two conditions to solve the problem of object disappearance during concurrent scanning. This leads to two solutions: Incremental Update and Snapshot At The Beginning (SATB).

The incremental update breaks the first condition. When the black object inserts a new reference to the white object, the newly inserted reference is recorded. After the concurrent scan is complete, the black object in the recorded reference is re-scanned. This can be simplified to mean that the black object becomes gray once a new reference to the white object is inserted.

The second condition is broken by the original snapshot. When the gray object is about to delete the reference to the white object, the reference to be deleted is recorded. After the concurrent scan, the gray object in the recorded reference is taken as the root and rescan again. This can also be simplified to mean that a snapshot of the object graph is searched at the moment the scan started, regardless of whether the reference relationship is deleted.

Both the insert and delete of the reference record are performed by the write barrier. Both incremental update and original snapshot solutions have applications in HotSpot virtual machines. For example, CMS is based on incremental update for concurrent marking, G1 and Shenandoah is based on original snapshot.

Reference documentation

  • Deeper understanding of the JVM Virtual Machine – 3rd edition. By Zhiming Zhou
  • www.codercto.com/a/71188.htm…
  • Blog.csdn.net/qq_32165517…