Garbage collection GC

There are garbage collections in the heap, such as Minor GC for Young, Major GC for Old, Young GC for Young, and Full GC for Old. But for an object, how do you know it’s garbage? Does it need to be recycled? How do you recycle it? And so these are questions that we need to explore in detail.

Because Java is automatic memory management and garbage collection, if you don’t understand recycling various aspects knowledge, once it is difficult to troubleshoot and solve problems, automatic garbage collection is to find the object in the Java heap, and to classify objects, to find out the working objects, and have not used the object, Then remove objects from the heap that will not be used.

How can I tell if an object is garbage?

  1. Reference counting If a reference to an object is held in an application, it is not garbage. If an object has no Pointers to reference it, it is garbage.

    Disadvantages: If two objects refer to each other in a circular manner, they can become islands and garbage cannot be collected.

    Solution: Use accessibility analysis

  2. Accessibility analysis

    By GC Root’s objects, start looking down to see if an object is reachable. GC Root traverses unreachable objects as garbage.

Conditions for GC Root: class loaders, Threads, local variables in the Java virtual machine stack (Pointers to variables of running methods), local variables in the local method stack, etc

When will the garbage be collected?

  1. System.gc() sends a signal to the JVM that garbage collection is needed. It is up to the JVM to decide when to do it. It is not recommended to call this method manually because garbage collection consumes CPU resources.
  2. – Eden or S sector is not enough
  3. The old days are not enough
  4. Metaspace(method area) not enough (Metaspace GC)

Full GC=Metaspace GC+Young GC+ Old GC

Full GC: This is usually caused by Old GC, because there are too many Old objects, and because the mark-clean – clean (compression) time is usually too long, it will cause the application to lag. This is why many applications should be optimized to avoid or reduce Full GC.

How to recycle?

Appropriate garbage collection algorithms are used for garbage collection, generally including the following algorithms:

1. Mark-sweep

  • Mark: Find the objects in memory that need to be reclaimed and mark them

At this time, all objects in the heap will be scanned to determine the objects to be reclaimed, which is time-consuming

  • Clear: Clears objects marked to be reclaimed, freeing the corresponding memory space

Disadvantages:

  1. Both marking and cleaning are time-consuming and inefficient
  2. A large amount of discrete memory fragmentation is generated, and too much space fragmentation may cause another garbage collection action to be triggered in advance when large objects need to be allocated during a later program run and not enough contiguous memory can be found.

2. Mark-copy

Divide memory into two equal areas, using only one at a time, as shown below:

When one block of memory is used up, the surviving objects are copied onto the other block, and the used memory space is cleared at a time.

The marker copy algorithm is suitable for the new generation because it is fast

Disadvantages: Reduced space utilization.

3. Mark and tidy

The efficiency of the tag copy collection algorithm will be low when the survival rate of the object is high. More importantly, if you do not want to waste 50% of the space, you need to have extra space for allocation guarantee, in order to cope with the extreme case that all objects in the memory used have 100% survival, so this algorithm can not be directly used in the old days.

The marking process is still the same as the “mark-clean” algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all surviving objects toward one end and then clean up memory directly beyond the end boundary.

Move all surviving objects towards one end, clearing out unexpected memory at the border.

Conclusion:

  1. Advantages: Simple algorithm disadvantages: time consuming because the entire memory space is scanned and space fragmentation is generated.
  2. Advantages: no space fragmentation, can have a large chunk of blank memory space disadvantages: troublesome algorithm complex, and execution time
  3. Mark-copy advantages: Space for time, fast. Disadvantages: because use space for time, so the space utilization rate is low
  4. The young area is suitable for the marker assignment algorithm (after the object is allocated, it may have a short life cycle and the young area has a high replication efficiency)
  5. The old area is suitable for tag clearing or tag sorting. (The old area object has a long life time, so it is not necessary to copy and copy again. It is better to make a mark and then clean it.)

Garbage collector

If the collection algorithm is the methodology of memory collection, then the garbage collector is the concrete implementation of memory collection.

1. Serial

The Serial collector is the most basic and oldest collector and was (prior to JDK1.3.1) the only collection of choice for the new generation of virtual machines.

It is a single-threaded collector, which not only means that it uses only one CPU or one collection thread to do garbage collection, but more importantly, it pauses other threads while garbage collection is done.

  • Advantages: Simple and efficient, with high single-thread collection efficiency
  • Disadvantages: The collection process requires all threads to be paused
  • Algorithm: Copy algorithm
  • Scope of application: The new generation
  • Application: Default generation collector in Client mode

2. Serial Old

The Serial Old collector is an older version of the Serial collector. It is also a single-threaded collector, but uses a “mark-collation algorithm” and runs the same as the Serial collector.

3. ParNew

It’s the multithreaded version of Serial. Like ParallelScanvenge, it is a new generation of garbage collectors

  • Advantages: Higher efficiency than Serial when multiple cpus are used.
  • Disadvantages: The collection process suspends all application threads and is less efficient than Serial on a single CPU.
  • Algorithm: Copy algorithm
  • Scope of application: The new generation
  • Application: Preferred generation collector in virtual machines running in Server mode

4. Parallel Scavenge

The Parallel Insane collector is a new generation collector that uses a replication algorithm and is a Parallel multithreaded collector. It looks the same as ParNew, but the Parallel Scanvenge collector is more focused on system throughput.

Throughput = time to run user code /(time to run user code + garbage collection time)

If higher throughput means less time for garbage collection, user code can take full advantage of CPU resources to complete the program’s computation tasks as quickly as possible.

5. Parallel Old

The Parallel Old collector is an older version of the Parallel Avenge collector that uses multithreading and mark-collation algorithms for garbage collection and is also more focused on system throughput.

5. CMS

Concurrent Mark Sweep- used for old-time garbage collectors

Liverpoolfc.tv: docs.oracle.com/javase/8/do…

The previous garbage collector is stop the world, can not be less? Can business code threads and garbage collection threads run together?

CMS’s intention: in order to minimize STW time and pursue a garbage collection with low pause times, it pays more attention to pause times.

But you can’t run all the way together. You can’t generate garbage and recycle garbage at the same time.

CMS adopts the mark clearing algorithm, and the whole process is divided into 4 steps

  1. CMS Initial mark GC Roots directly associated with objects without Tracing
  2. Concurrent tag CMS Concurrent Mark for GC Roots Tracing
  3. CMS remark Modifies concurrent markup that changes due to user programs
  4. CMS Concurrent sweep Sweeps the collection space of unreachable objects while new garbage is created, which is called floating garbage for the next cleaning

The collector thread can work with the user thread because of concurrent marking and concurrent cleaning throughout the process, so in general, the CMS collector’s memory reclamation process is performed concurrently with the user thread.

CMS faults:

1. Sensitive to CPU resource requirements.

The CMS collector is overly dependent on the multi-threaded environment. By default, the number of open threads is (number of cpus + 3) / 4. When the number of cpus is less than 4, the CMS has a significant impact on the user’s own operations because half of the computational power is allocated to executing the collector thread.

2. The CMS cannot clear floating garbage.

Floating garbage refers to the garbage that the CMS removes and the user thread generates new garbage. This part of garbage that is not marked is called “floating garbage” and can only be removed in the next GC.

3. CMS garbage collection will produce a large amount of space debris.

CMS uses a mark-sweep algorithm, so a large amount of space debris will be generated during garbage collection.

6. G1 (Garbage-First)

Liverpoolfc.tv: docs.oracle.com/javase/8/do…

Features:

  1. Generational collection (still preserving the concept of generational collection)
  2. Spatial integration (a “mark-tidy” algorithm on the whole, which does not lead to space fragmentation)
  3. Predictable pauses (more advanced than CMS is the ability for users to specify that no more than N milliseconds should be spent on garbage collection within a time segment of M milliseconds in length)

Region

The G1’s memory structure is a bit different from traditional memory partitioning. The G1 divides the memory into multiple regions of equal size (512 KB by default). Regions are logically contiguous but physical memory addresses are discontiguous. Each Region is marked with E, S, O, and H, indicating Eden, Survivor, Old, and Humongous respectively. Where E and S belong to the young generation, O and H belong to the old age. The schematic diagram is as follows:

H stands for Humongous. Large objects (hereinafter referred to as H objects) can be understood literally. An allocated object is considered a huge object if it is at least half the size of a Region. H objects are allocated in the old age by default to prevent memory copying of large objects during GC.

All garbage collection is based on one region. The JVM knows internally which regions have the fewest objects (i.e. This region is the emptiest), and these regions are always collected First (because there are fewer objects and memory is relatively empty, which is fast), hence the name garbage-first, where G stands for Garbage and 1 stands for First.

The whole process is divided into four steps:

  1. Initial and concurrent tags work the same as CMS

  2. The final markup is the same as the CMS relabeling phase, which corrects the markup record for that part of the object where the markup changes due to user actions during concurrency. The pause time in this phase is generally slightly longer than in The initial tagging phase, but much shorter than in The concurrent tagging phase, requiring Stop The World.

  3. Filter collection first sorts the collection value and cost of each Region and makes a collection plan based on the expected GC pause time of the user. This phase can be executed concurrently with the user program, but because only part of the Region is reclaimed, the timing is user-controlled. This may lead to only part of the recycling, some garbage can not be recycled, can only look forward to the next recycling, so the garbage collection time should not be too strict.

In JDK7, G1 is available for trial use in JDK8, default in JDK9, but in JDK11, zGC is available for less than 10ms. There is not much commercial use now.

7. Conclusion:

Serial collector: Serial/Serial Old

  • Only one garbage collection thread can execute, and the user thread is paused
  • It is suitable for embedded devices with small memory

Parallel collector [throughput first] : Parallel Scavenge/Parallel Old

  • Multiple garbage collection threads work in parallel, but the user thread is still in a waiting state.

Concurrent collector [pause time first] : CMS/G1

  • The user thread and the garbage collector thread execute at the same time, work together (although the CPU time slice is still switched), focus on pause times, and are better suited for interactive scenarios such as the Web

Throughput and pause times are also indicators of how well a garbage collector is performing.

Related JVM parameters:

Serial:

  • – XX: + UseSerialGC
  • – XX: + UseSerialOldGC

Parallel (throughput first) :

  • – XX: + UseParallelGC
  • – XX: + UseParallelOldGC

Concurrent collector (response time priority) :

  • – XX: + UseConcMarkSweepGC
  • – XX: + UseG1GC

Note: Starting with JDK8, the MetaSpace section replaces the Perm section (permanent generation), so the corresponding JVM parameters become -xx :MetaspaceSize and -xx :MaxMetaspaceSize.

How do I choose the right garbage collector?

In general, there is no need to choose. If this does not meet your requirements, you can adjust the heap size. If not, do the following:

●If the application has a small data set (up to approximately 100 MB), then select the serial collector with the option -XX: + useserialgc. ●If the application will be run on a single processor and there are no pause time requirements, then let the VM select the collector, or select the serial collector with the option -XX : +UseSerialGC ●If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of 1 second or longer are acceptable, then let the VM select the collector, Or select the parallel collector with -xx :+UseParallelGC. ●If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately 1 second, then select the concurrent collector with XX:+UseConcMarkSweepGC or “-XX:+UseG1GC`.


JVM series:

JVM garbage collection

This article thoroughly understand the JVM runtime data area and JVM memory structure

[JVM series]一, source -> class file ->JVM process detail (class file interpretation/class loading mechanism/class loader)