This is the 23rd day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Garbage collection algorithm

Common garbage collection algorithms include:

  • Mark-clear algorithm
  • Replication algorithm
  • Mark-collation algorithm
  • Generational collection algorithm

JVM garbage collection algorithms use generational collection algorithms, copy algorithms, and mark-collation algorithms. All three algorithms are used. The memory in the JVM is divided into new generation and old generation using the generation collection algorithm. The new generation is collected using the copy algorithm, while the old generation uses the mark-collation algorithm.

1. Mark-clear algorithm

The mark-clear algorithm is divided into two stages: “mark” and “clear”. First, through the reachability analysis, all the objects that need to be recycled are marked, and then all the marked objects are uniformly recycled.

The mark-clearing algorithm has two defects: one is efficiency, the process of both marking and clearing is not efficient; the other is that a large amount of debris space will be created after the clearing. It is possible to make GC again when applying for large chunks of memory because there is not enough contiguous space.

2 Replication Algorithm

In order to solve the problem of the debris space, the “replication algorithm” emerged. The principle of the replication algorithm is that the memory is divided into two parts. Each time the memory is allocated, one part of the memory is used. When the memory is insufficient, all the surviving parts of the memory are copied to the other part. Then will then clean up the entire memory that has been used.

The replication algorithm solves the problem of space debris. But it also brings new problems. Because each time you apply for memory, you can only use half of the memory space. The memory usage is severely insufficient.

The generation and generation of JVMS use GC based on replication algorithms. Some optimizations are made to solve the problem of insufficient memory utilization.

A special study by IBM showed that 98% of Cenozoic objects are “ephemeral”, meaning that only about 2% of Cenozoic objects survive a single GC session.

So there is no need to divide two memory Spaces in a 1:1 ratio. Instead, the memory is divided into three parts, a large Eden region and two smaller Survivor regions. Eden occupies 80% of the memory, and Survivor occupies 10% of the memory. When creating a new object, only the Eden region and one Survivor region are used. When GC is performed, all the surviving objects in the Eden region and Survivor region are copied to the other Survivor region. Then clean up the Eden area and the Survivor area you just used.

This partition of memory solves the memory utilization problem, with 90%(80% + 10%) of current memory available each time an object is created.

3. Mark-collation algorithm

The replication algorithm is more efficient when there are fewer live objects after GC, but becomes less efficient when there are more live objects and more copies are performed. However, the survival rate of old objects after GC is relatively high, so someone proposed “mark-collation algorithm”.

The “marking” process of the mark-tidy algorithm is the same as that of the “mark-clean algorithm”, but there is no direct cleaning after the marking. Instead, all living objects are moved to one end of memory. Clean up the rest immediately after the move.

4. Generational collection algorithm

Generational collection divides memory into new generation and old generation. Allocation is based on the lifetime of the object, or the number of GC cycles it has experienced. When objects are created, memory is allocated in the new generation and the age of the object is +1 if the pair is still alive after a GC. When the age exceeds a certain value (the default is 15, which can be set with -xx :MaxTenuringThreshold), the object will enter the old age if it is still alive.