preface

This article is mainly about some GC related knowledge. It took about two or three weeks to sort out and memorize this knowledge point, which was really difficult for me. I also welcome you to tell me what needs to be added. Now it is inevitable that the writing style is immature and rough, and the notes are also expressed in the language that they can understand for their own summary and combing. If I am lucky enough to be seen by the big guy, I really hope that you can point out my shortcomings and shortcomings, and guide me in what direction I should study and think more deeply. Nuggets — a community that helps developers grow and wants to grow from small fry to small fry.

GC knowledge collection

The GC reclaims objects in the heap and method areas

  • Why is there GC

Security considerations, reduce memory leaks, reduce programmer workload

  • What objects are collected by GC
  1. Object has no reference
  2. An uncaught exception occurred in the scope
  3. The program completes normal execution in scope
  4. The program executes system.exit()
  5. The program has terminated unexpectedly

Methods area

The thing about the method area that needs to be reclaimed is some obsolete constants and useless classes.

  1. Collection of obsolete constants. Just look at the reference count here. No object references this constant and can be safely recycled.
  2. Collection of useless classes.

What is A useless class: A. All instances of the class have been reclaimed. That is, there are no instances of this class in the Java heap; B. The ClassLoader that loaded the class has been reclaimed. C. The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.

The heap

For objects in the heap, reachability analysis is mainly used to determine whether an object still has references, and if the object does not have any references it should be reclaimed. According to our actual needs for references, there are four different types of references, and the recycling mechanism for each reference is different.

When does the JVM GC run

The young generation

Birthplace of a new Java object (if the newly created object takes up a lot of memory, it is allocated directly to the old age). MinorGC Many newly created objects will be assigned here, and they will soon become unreachable and disappear. MinorGC occurs when Eden’s survival rate is less than 10% and there is not enough space for new objects

The paper process
  1. Most newly created objects will be stored in the Eden area
  2. After the first Minor GC in Eden, the surviving objects are moved to one of the Survivor Spaces.
  3. After that, each time GC is performed in Eden, the surviving objects are stacked in the same survivor space
  4. When one survivor space is saturated, surviving objects are moved to another survivor space and emptied of the saturated survivor space
  5. When the preceding steps are repeated n times, the surviving object will be moved to the old age when it reaches the maxTenuringThreshold (15 by default)

The old s

It mainly stores memory objects with long lifetime in application programs. The process by which FULL GC objects disappear from the old generation. The area of the old generation is much larger than that of the young generation, so the number of FullGC’s rises less to the point where the old object is larger than the free space of the old generation, or the HandlePromotionFailure parameter forces FullGC when the old object is smaller

  • How can fullGC happen so often

Misuse of the static keyword can cause the full GC to be overloaded with frequent multithreading, thread pool queues that are heavily blocked, or to have an unlimited number of things permanently in memory, or to have caches that are not set to expire and accumulate over a long period of time

The permanent generation

The conditions for GC to occur in this region are particularly harsh. All instances are reclaimed. The classLoader that loaded the class is reclaimed and the class object cannot be accessed in any way

GC algorithm

Reference counting algorithm (deprecated)

Each object is given a counter that is +1 when referenced, -1 when referenced, and considered garbage when the counter is 0. The implementation is simple and efficient, but it does not solve the problem of circular reference, and the increase and decrease of counters bring unnecessary overhead. Jdk1.1 was abolished.

Root search algorithm

Starting with GC ROOT, search for reference nodes and continue to search for reference nodes until all reference nodes have been found. The remaining nodes, which are not referenced, are called useless nodes and can be reclaimed.

Objects that are GCROOT are:

  1. Object referenced in the virtual machine stack
  2. Objects referenced in the local method stack
  3. Objects referenced by static properties in the method area
  4. The object referenced by the constant in the method area

Mark-clear algorithm

The scan starts at the root node, marks surviving objects, and then the entire space is scanned. Unmarked objects are reclaimed.

The mark-clear algorithm does not involve moving objects, only surviving objects. Question:

  1. Memory fragmentation can result if non-living objects are reclaimed without defragmenting the living objects.
  2. The efficiency of marking and clearing is also not high. After clearing, a large amount of discontinuous space will be generated. If you want to allocate large space objects, you may not find enough continuous space.

Three-color abstraction, bitmap marking and other technologies are used to improve the efficiency of the algorithm, which is more efficient when there are more living objects.

Replication algorithm

Starting from the root collection, the live objects are copied into the free range and sorted strictly by memory address, while the GC thread updates the memory reference addresses of the live objects to point to the new memory address. After scanning, all objects in the active range are recovered. The idle interval becomes the active interval. The next GC repeats this operation. Copy algorithms can quickly allocate memory by colliding Pointers

Recursive algorithm, iterative algorithm, solving the first two recursive stack, cache line and other problems of approximate first search algorithm

This works well when there are few viable objects. 50% memory waste must be overcome. The copy algorithm compensates for the mark/clear algorithm, which has a chaotic memory layout. It is mainly used for the new generation, because the objects of the new generation are mostly dead, so the ratio of the two Spaces is about 8:1

Tag sorting algorithm

For old age collection, the scan starts from the root collection, marks the surviving objects, removes the unmarked objects, and fills the slot positions with the right-most surviving objects. It solved the memory fragmentation problem, but it was less efficient than the mark-clear algorithm because it then had to sort out the newly pointed reference addresses of the living objects

It implements a two-finger recovery algorithm, Lisp2 algorithm, and a Threaded Compaction algorithm

While compaction and copying involve moving objects, depending on the algorithm, a compaction might calculate the destination address of the object once, then modify the pointer, and finally move the object. One can imitate these things in one piece, so it will be faster. Also, you need to be aware of the overhead incurred by the GC not just in terms of the Collector’s time, but also in terms of the Allocator. If we can ensure that memory is not fragmented, we can use pointer bumping to distribute data around us, which is very fast. If memory is fragmented, it has to be managed like a freelist, which is usually slower.

Garbage collector

Serial (XX: + UseSerialGC)

Replication algorithms for a new generation of serial collectors. The Serial collector is the most basic and oldest collector in the Java Virtual machine. Prior to JDK1.3 was the only choice for the new generation of Java virtual machine collectors. Currently, it is also the default garbage collector for ServerVM 4 core and less than 4GB machines under ClientVM. Instead of using only one CPU for collection, the Serial collector suspends all user threads when the JVM needs to do garbage collection until the collection is complete. The JVM’s Chinese name is Java Virtual Machine, so it works like a virtual computer, and each thread is considered a processor for the JVM, so CPU0 and CPU1 in the figure are actually the user’s threads, not the actual machine CPUS. The Serial collector, while the oldest, is the most efficient compared to other collectors in a single-CPU-bound environment, where there is no overhead of thread interaction and garbage collection is the focus.

SerialOld (XX: + UseSerialGC)

Mark-collation algorithm the main client mode in the old days. In Server mode, 1. Use with other garbage collectors. 2. As an alternative garbage collection solution using the CMS collector.

ParNew

The new generation of replication algorithms, ParNew, is essentially a multithreaded version of the Serial collector. In addition to the Serial collector, it is the only one that works with the CMS collector. ParNew is the preferred next-generation collector for many JVMS running in Server mode. However, it is far less efficient on a single CPU than the Serial collector, so be sure to pay attention to usage scenarios.

ParallelScavenge (XX: + UseParallelGC)

The replication algorithm new-generation throughput-first collector adaptive adjustment strategy is also an important difference between the ParallelScavenge collector and the ParNew collector. The ParallelScavenge collector is designed to achieve the throughput of a controllable application. Throughput is the ratio of the amount of time the CPU spends running user code to the total CPU consumption, i.e., throughput = time to run user code/(time to run user code + garbage collection time). If the virtual machine runs for a total of 100 minutes and garbage collection takes 1 minute, then the throughput is 99%.

ParallelOld

The collector was only available after JDK1.6, and before the ParallelScavenge avenge was the only application of the SerialOld insane, which severely slowed down the ParallelScavenge avenge as a whole. With the advent of ParallelOld, the “throughput first” collector lived up to its name! ParallelScavenge + ParalleloOld collectors are preferred if throughput and CPU number is greater than 1.

CMS tag clears old age

It is a collector where response time is more important than throughput and the main goal is to obtain the minimum garbage collection pause time

The CMS execution process is as follows:

STW Initial Mark

In this phase, the virtual machine is required to pause the executing application thread, officially known as Stop Tow World. This process scans and marks directly associated objects from the root object. This process will be completed quickly.

Concurrent marking

This phase follows the initial marking phase, tracing the marking down from the initial marking. Note that the concurrent flag indicates that the user thread can execute concurrently with the GC thread without suspending the user thread.

Concurrent Precleaning

This phase is still concurrent, and the JVM looks for objects that entered the old age during the “concurrent marking” phase (at which point objects may be promoted from the new generation to the old age, or assigned to the old age). By rescanning, reduce the effort of “relabeling” in one phase because the next phase STW.

Remarking (STW remark)

This phase suspends the executing application thread again, reroots the object to start looking for and marking objects that were missed during the concurrent marking phase (caused by an update in the object state after the concurrent marking phase), and handles object associations. This time will take longer than “initial marking”, and this stage can be marked in parallel.

Concurrent cleaning

This phase is concurrent and the application thread and GC cleanup thread can execute concurrently.

Concurrent reset

This phase is still concurrent, resetting the data structure of the CMS collector and waiting for the next garbage collection.

Disadvantages of CMS:

  1. Memory fragmentation. Memory fragmentation occurs in the memory space due to the mark-clean algorithm used. The CMS collector, however, makes some minor optimizations by aggregating the unallocated space into a list. When a JVM needs to allocate memory, it searches the list to find the appropriate space to store the object. However, fragmentation is still an issue. If an object needs three contiguity blocks of space to store, and no such space can be found due to fragmentation, Full GC will result.

  2. More CPU resources are required. Because of concurrent processing, in many cases both the GC thread and the application thread execute concurrently, which requires more CPU resources and is also the cause of some throughput sacrifice.

  3. Need more heap space. Because CMS marks the application’s threads are still executing, there is the problem of continuing to allocate heap space. To ensure that CMS has space to allocate to new objects before reclaiming heap space, some space must be reserved. CMS starts garbage collection by default at 68% of old decade space usage. Can pass – XX: CMSinitiatingOccupancyFraction = n to set the threshold.

G1 tagging – Finishing

No memory fragmentation. Pause times can be controlled very precisely, enabling low-pause garbage collection without sacrificing throughput. The G1 collector avoids region-wide garbage collection by dividing heap memory into separate regions of fixed size and tracking the progress of garbage collection in these regions, while maintaining a priority list in the background that prioritizes the areas with the most garbage collected at a time based on the allowed collection time. Zone partitioning and priority zone collection mechanisms ensure that the G1 collector can achieve maximum garbage collection efficiency in limited time.