Illustrates the Java garbage collection mechanism

Abstract:

The automatic memory management advocated in The Java technology architecture ultimately boils down to the automatic solution of two problems: allocating memory to objects and reclaiming memory allocated to objects, and these two problems are targeted at the memory region in the Java memory model heap. On the subject of object allocation, I’ve already explained how to divide up the available space and its implications for thread safety in my blog “Overview of the JVM Memory Model.” In this article, I’ll give further rules for allocating memory in conjunction with garbage collection strategies. The introduction of garbage collection mechanism can effectively prevent memory leaks, ensure the efficient use of memory, but also greatly free the Hands of Java programmers, so that they do not need to consider memory management when writing programs. This paper focuses on two classical algorithms to determine whether an object can be collected or not, and details the basic ideas of four typical garbage collection algorithms and their direct application — garbage collector, and finally introduces the memory allocation rules combined with the memory collection strategy.

Copyright Notice:

In this paper, the original author: nerd Rico author’s blog: blog.csdn.net/justloveyou…

All the legends involved in this paper are collected by the author, some of which are drawn by the author himself, and some are modified based on the Internet. If copyright is involved, please leave a comment or contact the author directly. See the left column for contact information.

Friendly tips:

In order to better understand Java’s garbage collection mechanism, I recommend that you have a general understanding of the JVM memory model. Since I’ve already covered the JVM memory model in depth in my post “Overview of the JVM Memory Model,” I won’t go into details here.

This article is based on JDK 1.6, and there may be some differences between different virtual machines, but it does not affect our overall understanding of the JVM garbage collection mechanism.

First, the significance of garbage collection mechanism

As I mentioned in my last post, Overview of the JVM Memory Model, the JVM memory model consists of three parts: The heap (the Java heap reachable by Java code and the method area used by the JVM itself), the stack (the virtual machine stack that serves Java methods and the local method stack that serves Native methods), and the program counter that ensures continuous execution in a multithreaded environment. In particular, we mentioned at the time that the Java heap is the main area for garbage collection, so it is also referred to as the GC heap; And the method area also has a less precise expression, is permanent generation. In general, heaps (including Java heaps and method areas) are the primary object of garbage collection, and Java heaps in particular.

In fact, automatic memory management, as advocated in the Java technology architecture, ultimately boils down to the automated solution of two problems: allocating memory to objects and reclaiming memory allocated to objects, and these two problems target the heap area in the Java memory model. On the subject of object allocation, I’ve already explained how to divide up the available space and its implications for thread safety in my blog “Overview of the JVM Memory Model.” In this article, I’ll give further rules for allocating memory in conjunction with garbage collection strategies. In addition, we know that the garbage collection mechanism is a significant feature of the Java language, which can effectively prevent memory leaks and ensure efficient use of memory, so that Java programmers do not need to worry about memory management when writing programs. The Java garbage collection mechanism has a complex set of issues to consider, and this article addresses three core issues, including:

Which memory needs to be reclaimed? (Two classical algorithms for whether an object can be reclaimed: reference counting and reachable analysis algorithms)
When is it recycled? (Garbage collection timing of new generation, old age and permanent generation of heap, MinorGC and FullGC)
How to recycle? (Three classical garbage collection algorithms (mark removal algorithm, copy algorithm, mark collation algorithm) and generational collection algorithm and seven garbage collectors)

Before we explore the Java garbage collection mechanism, we should first remember one word: stop-the-world. Stop-the-world means that the JVM stops the execution of the application due to GC, and this can happen with any GC algorithm. When stop-the-world occurs, all threads except those required for GC are in a waiting state until the GC task completes. In fact, GC optimization often means reducing the time it takes for stop-the-world to occur so that the system has high throughput and low pauses.

Ps: Memory leak refers to that the memory space is not reclaimed after it is used. In the general case that does not involve complex data structure, Java memory leak is shown as the life cycle of a memory object exceeds the length of time that the program needs it.

How do I determine whether an object can be reclaimed?

1. Reference counting algorithm: judge the number of references to the object

The reference counting algorithm determines whether an object can be reclaimed by determining the number of references to the object.

Reference counting algorithms were an early strategy in the garbage collector. In this approach, each object instance in the heap has a reference count. When an object is created and the object instance is assigned to a reference variable, the reference count of the object instance is set to 1. When any other variable is assigned a reference to this object, the reference count of the object instance increases by 1 (a = b, then the counter of the object instance referenced by B increases by 1), but when a reference of an object instance expires or is set to a new value, the reference count of the object instance decreases by 1. In particular, when an object instance is garbage collected, the reference counter of any object instance it references is reduced by one. Any instance of an object with a zero reference count can be garbage collected.

Reference counting collectors are quick to execute and interwoven in a program run, which is good for real-time environments where programs need not be interrupted for long periods of time, but it is difficult to solve the problem of circular references between objects. As shown in the following program and schematic, the reference count between objects objA and objB can never be zero, and then both objects can never be reclaimed.

Public ReferenceCountingGC {public Object instance = null; public static void testGC(){ ReferenceCountingGC objA = new ReferenceCountingGC (); ReferenceCountingGC objB = new ReferenceCountingGC (); // The reference count between objects can never be 0 objb. instance = objA; objA.instance = objB; objA = null; objB = null; System.gc(); }}Copy the code

The last two lines of the above code assign objA and objB to null, which means that objA and objB point to objects that are no longer accessible, but because they refer to each other and their reference counters are not zero, the garbage collector will never reclaim them.

2. Reachability analysis algorithm: judge whether the reference chain of the object is reachability

Reachability analysis algorithm determines whether an object can be reclaimed by judging whether its reference chain is reachability.

The reachable analysis algorithm is introduced from the graph theory in discrete mathematics. The program regards all Reference relations as a graph, and searches downward from these nodes through a series of objects named “GC Roots” as the starting point. The search path is called the Reference Chain. When an object is not connected to GC Roots by any reference chain (in graph theory terms, it is unreachable from GC Roots to this object), the object is proved to be unavailable, as shown in the figure below. In Java, objects that can be used as GC Root include:

Objects referenced in the virtual machine stack (local variable table in the stack frame);
The object referenced by the class static attribute in the method area;
The object referenced by the constant in the method area;
Objects referenced by Native methods in the Native method stack;

Garbage collection algorithm

1. Mark clearing algorithm

The mark-clear algorithm is divided into two stages: marking and clearing. The algorithm first scans from the root set and marks the surviving objects. After marking, unmarked objects in the whole space are scanned and recycled, as shown in the figure below.

There are two major shortcomings of the mark-clear algorithm:

Efficiency problem: the efficiency of both marking and clearing is not high;
Space problem: tag – clear algorithm does not need to be the movement of the object, and to not only survive for processing object, so the tag removal after will produce a large number of discrete memory fragments, space debris after too much may cause in the process of program run needs to allocate large objects, unable to find enough contiguous memory and had to trigger another garbage collection action in advance.

2. Replication algorithm

The replication algorithm divides the available memory into two equally sized pieces by capacity and uses only one piece at a time. When this area of memory is used up, the surviving objects are copied to the other area, and the used memory space is cleaned up again. This algorithm is suitable for scenarios with low object survival rate, such as the new generation. In this way, the memory is reclaimed for the whole half region every time, and there is no need to consider the complicated situation of memory fragmentation when allocating memory. As long as the heap top pointer is moved, the memory can be allocated in order, which is simple to implement and efficient to run. The algorithm is shown as follows:

In fact, today’s commercial virtual machines use this algorithm to recycle the new generation. Because studies have found that only about 10% of objects in the new generation survive each time they are recycled, there are few objects to copy, which is not bad. As explained in the blog post overview of THE JVM Memory Model, the practice is to divide the new generation of memory into one large Eden space and two smaller Survivor Spaces (as shown below), using Eden and one Survivor at a time. When recycling is done, the surviving objects in Eden and Survivor are copied once to another Survivor space, and Eden and the Survivor space that was just used are cleaned up. By default, the HotSpot VIRTUAL machine has an 8:1 ratio of Eden to Survivor, which means that each new generation has 90% (80%+10%) of its available memory, and only 10% of its memory is “wasted”.

3. Tag sorting algorithm

The copy-collection algorithm has to carry out more replication operations when the object survival rate is high, and the efficiency will be low. More importantly, if you do not want to waste 50% of the space, you need to have extra space for allocation guarantee, in the extreme case that all objects in the memory used are 100% alive, so in the old days, this algorithm generally cannot be used directly. Sorting algorithm of the marking process similar to tag removal algorithm, but the subsequent steps are not directly to recyclable objects to clean up, but for all the live objects are moving to the end, and then directly to remove the borders of memory, similar to the defragmentation process, the garbage collection algorithm applies to the object high survival rate of scenario (s), its action principle as shown in the figure below.

The most significant difference between the tag sorting algorithm and the tag clearing algorithm is that the tag clearing algorithm does not move objects and only deals with non-viable objects. The mark-up algorithm moves all the live objects to one end and processes the non-live objects, so there is no memory fragmentation. The function of the tag collation algorithm is shown as follows:

4. Generational collection algorithm

For a large system, when a large number of objects and method variables are created, there will be a large number of objects in the heap memory, and it is bound to be inefficient to analyze whether objects should be recycled one by one. The generational collection algorithm is based on the fact that different objects have different life cycles (survivability) and that objects with different life cycles are located in different areas of the heap, so using different strategies for collecting different areas of heap memory can improve the performance of the JVM. Contemporary commercial virtual machines all use generational collection algorithms: if the survival rate of new generation objects is low, the replication algorithm is adopted; In the old age, the survival rate is high, so we use the tag clearing algorithm or the tag sorting algorithm. Java heap memory can generally be divided into three modules: New generation, old generation, and permanent generation, as shown in the following figure:

1) Young Generation

The goal of the Cenozoic is to collect objects with a short life span as quickly as possible, and in general, all new objects are placed in the Cenozoic first. The new-generation memory is divided into one Eden zone and two survivor zones (survivor0, survivor1) in a ratio of 8:1:1. Most objects are generated in Eden zone. For garbage collection, copy Eden survivable objects to survivable 0 and empty Eden survivable objects. When survivable 0 is also full, copy Eden and survivable objects to Survivable 1 and empty Eden and survivable 0. At this point survivor0 is empty, then swap roles of Survivor0 and Survivor1 (i.e., Eden and Survivor1 will be scanned for next garbage collection), leaving Survivor0 empty, and so on. In particular, survivable objects are stored directly to the old age when survivable objects in survivor1 and Survivor0 are also insufficient. If the old age is also full, a FullGC will be triggered, that is, both the new generation and the old generation will be recycled. Note that THE Cenozoic GC is also called MinorGC. MinorGC occurs with a high frequency and may not be triggered until Eden is full.

2) the Old Generation

The old age stores objects with a long life cycle. As described above, objects that survive N garbage collections in the new generation will be put into the old age. In addition, the memory of the old age is much larger than that of the new generation (approximately 1:2). When the old age is Full, Major GC will be triggered. Objects of the old age live for a long time, so the frequency of FullGC is relatively low.

3). Permanent Generation

The persistent generation is mainly used to store static files, such as Java classes and methods. Persistent generation has no significant impact on garbage collection, but some applications may dynamically generate or call some classes, such as bytecode frameworks such as reflection, dynamic proxy, and CGLib. In such cases, a large permanent generation space should be set up to store the classes that are added during operation.

5, summary

Because objects are processed in generations, garbage collection regions and times are different. There are two types of garbage collection, Minor and Full.

Minor GC: Collects for the new generation without affecting the old generation. Because most new generation Java objects die frequently, Minor GC is very frequent, and typically uses fast, efficient algorithms to make garbage collection complete as quickly as possible.
Full GC: Also called Major GC, recycles the entire heap, both Cenozoic and older generations. Full GC is slower than Minor GC because it needs to recycle the entire heap, so you should minimize the number of Full GC’s due to old ages being written Full, Perm being written Full, and System.gc() being explicitly called.

Garbage collector

If the garbage collection algorithm is the methodology of memory collection, then the garbage collector is the concrete implementation of memory collection. The figure below shows 7 collectors that operate on different generations, including the Serial, PraNew, and Parallel Avenge collectors, and the Serial Old, Parallel Old, AND CMS collectors. There is also a G1 collector for recycling the entire Java heap. The wiring between different collectors indicates that they can be used together.

Serial collector (copy algorithm): the new generation of single-threaded collector, marking and cleaning are single-threaded, the advantage is simple and efficient;
Serial Old collector (mark-collation algorithm): Old single-threaded collector, older version of Serial collector;
ParNew collector (copy algorithm): A new generation of delegate parallel collector, which is actually a multi-threaded version of Serial collector and performs better than Serial on multi-core CPUS;
The Parallel Collector. A new generation of Parallel collectors, driven by the pursuit of high throughput and efficient CPU utilization. Throughput = user thread time /(user thread time +GC thread time), high throughput can efficiently use THE CPU time, as soon as possible to complete the calculation tasks of the program, suitable for background applications such as the corresponding interaction requirements are not high scenes;
The Parallel Old Collector (mark-collation algorithm) : the Parallel Collector, throughput first, older version of the Parallel Avenge collector;
CMS(Concurrent Mark Sweep) collector (mark-sweep algorithm) : The old parallel collector aims at obtaining the shortest collector pause time. It has the characteristics of high concurrency and low pause, and pursues the shortest COLLECTOR pause time.
G1(Garbage First) : the Java heap parallel collector. The G1 collector is a new collector in JDK1.7. The G1 collector is based on the “mark-collation” algorithm, that is, does not generate memory fragmentation. In addition, the G1 collector differs from its predecessors in that it collects the entire Java heap (including the new generation and the old generation), whereas the first six collectors only collect the new generation or the old generation.

5. Memory allocation and reclamation strategy

The automatic memory management advocated in the Java technology architecture ultimately boils down to solving two problems automatically: allocating memory to objects and reclaiming memory allocated to objects. In general, objects are allocated on the Eden region of the new generation. If the local thread allocation cache (TLAB) is enabled, allocation will be made on the TLAB first by thread. In rare cases, it may be directly assigned in the old age. In general, the memory allocation rules are not a one-layer invariant, and the details depend on which combination of garbage collectors is being used and the Settings of memory-related parameters in the virtual machine.

1) Objects are allocated in Eden first. When Eden has insufficient space for allocation, the VIRTUAL machine will initiate a MinorGC. Nowadays, commercial virtual machines generally use the replication algorithm to recover the new generation. The memory is divided into a large Eden space and two small Survivor Spaces, and Eden and one Survivor are used each time. When garbage collection is performed, the surviving objects in Eden and Survivor are copied to another Survivor space once, and Eden and the previous Survivor space are disposed of. (The default HotSpot VIRTUAL machine size ratio of Eden to Survivor is 8:1.) When Survivor space is insufficient, the HotSpot virtual machine needs to rely on the older generation to allocate guarantees.

2) Big object directly into the old age. Large objects are Java objects that require a large amount of contiguous memory. The most typical large objects are long strings and arrays.

3) Long-lived objects will enter the old age. When an object has had a certain number of Minor GC’s in the new generation (default: 15), it is promoted to the old generation.

4) Age determination of dynamic objects. In order to better adapt to the memory conditions of different programs, the virtual machine does not always require that the object age must reach MaxTenuringThreshold to advance to the old age. If the sum of all object sizes of the same age in the Survivor space is greater than half of the Survivor space, Objects older than or equal to this age can go directly to the old age without waiting until the age specified in MaxTenuringThreshold.

It is important to note that Java’s garbage collection mechanism is the ability provided by the Java Virtual machine to dynamically reclaim memory occupied by unreferenced objects at idle time in an unscheduled manner. That is, the garbage collector reclaims the memory space occupied by objects without any references, not the objects themselves.

Memory leaks in Java

Although Java has a garbage collection mechanism, memory leaks can also occur, as mentioned below:

(1) Static use of collection classes such as HashMap and Vector is the most prone to memory leaks because the lifetime of these static variables is the same as that of the application. All objects cannot be freed because they will always be used by Vector.

private static Vector v = new Vector(); public void test(Vector v){ for (int i = 1; i<100; i++) { Object o = new Object(); v.add(o); o = null; }}Copy the code

In this case, the vm stack holds a reference to V for the Vector and O for the Object. In the for loop, we keep generating new objects, adding them to the Vector, and then emptying the O reference. The problem is that even though we left the O reference empty, the Object we created cannot be collected when garbage collection occurs. Because garbage collection finds v references when it traces references in the code stack, further down it finds references to Object objects in the memory space that the V references point to. That is, although the O reference has been null, there are still other references to the Object that can be accessed, so the GC cannot release them. If the Object Object has no use for the program after this loop, the Java program is considered to have a memory leak.

(2). All resource connections including database connections, network connections, IO connections, etc., are not explicitly called close to avoid memory leakage caused by GC collection.

(3). The use of listeners may also lead to memory leaks when the listener is not deleted at the same time as the object is released.

Seven. Knowledge supplement

1, references,

1). Cite overview

Whether the number of references is determined by reference counting algorithm or whether the reference chain of an object is reachable by reachability analysis algorithm, determining whether an object is alive or not is related to “reference”. Prior to JDK 1.2, references were traditionally defined in Java: if the value stored in a reference type represented the starting address of another chunk of memory, that chunk represented a reference. This definition is very pure, but too narrow, an object under this definition can only be referred to or not referred to two states, for how to describe some “tasteless” objects, it is a pity to abandon it. We want to describe objects that can remain in memory when there is enough space; If the memory is still stressed after garbage collection, these objects can be discarded. Many system cache functions fit this application scenario.

To this end, Java expanded the concept of references after JDK 1.2, References are divided into four types: Strong Reference, Soft Reference, Weak Reference and Phantom Reference. The strength of these four kinds of references gradually decreases.

2). Types and Definitions of references Strong references are common in program code, such as “Object obj = new Object()”. As long as strong references exist, the garbage collector will never reclaim the referenced object.

Soft references are used to describe objects that are useful but not necessary. Objects associated with soft references are listed in the recycle scope and recycled a second time before the system is about to run out of memory. An out-of-memory exception is thrown if there is still not enough memory left for this collection. After JDK 1.2, a SoftReference class was provided to implement soft references.

Weak references are also used to describe non-essential objects, but they are weaker than soft references, and objects associated with weak references only survive until the next garbage collection occurs. When the garbage collector works, objects associated only with weak references are reclaimed regardless of whether there is currently enough memory. After JDK 1.2, WeakReference classes were provided to implement weak references.

Virtual references are the weakest type of reference relationship. The existence of a virtual reference does not affect the lifetime of an object, nor can an object instance be obtained through a virtual reference. The sole purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector. After JDK 1.2, the PhantomReference class was provided to implement virtual references.

2. Method area recycling

The purpose of the method area’s memory reclamation is to recycle constant pools and unload types. Recycling deprecated constants is very similar to recycling objects in the Java heap. For example, if a String “ABC” is already in the constant pool, but there is no String named “ABC” in the system. In other words, no String refers to the “ABC” constant in the constant pool, and there is no other reference to the literal. If memory reclamation occurs at this point, and if necessary, the “ABC” constant is “removed” from the constant pool by the system. Symbolic references to other classes (interfaces), methods, and fields in the constant pool are similar.

It’s easier to determine whether a constant is “obsolete,” whereas the criteria for determining whether a class is “useless” are much tougher. A class must meet all three of the following criteria to be considered “useless” :

All instances of the class have been reclaimed, that is, there are no instances of the class in the Java heap;
The ClassLoader that loaded the class has been reclaimed;
The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.

The virtual machine can recycle (unload) useless classes that meet the above three conditions. This is only “yes”, rather than recycling objects when they are not used. In particular, scenarios where bytecode frameworks such as reflection, dynamic proxy, and CGLib are used extensively, as well as scenarios where dynamic JSP generation and OSGi are frequently customized classloaders, require the virtual machine to have the ability to unload classes to ensure that permanent generations do not overflow.

Eight more.

For more information about the structure of the JVM memory model, the creation of Java objects in virtual machines, the localization process, and memory exception analysis, please refer to my blog “Overview of THE JVM Memory model”.

For more on Java SE progression, check out my column “The Way to Java SE Progression.” This column mainly studies the JVM foundation, Java source code and design patterns and other Java advanced knowledge, from the primary to the senior constantly summarize, analyze the inherent logic of each knowledge point, throughout, covering the whole Java knowledge, step by step to improve, improve their own at the same time, the Java learning and thinking to share with you. Great oaks from little acorns grow, and your limit is determined by your foundation. Let’s join hands to reach the top of Java.

Reference:

In Depth understanding the Java Virtual Machine (2nd edition) Zhiming Chow has an in-depth understanding of Java garbage collection

Illustrates the Java garbage collection mechanism

First, the significance of garbage collection mechanism

Garbage collection algorithm

5. Memory allocation and reclamation strategy

Seven. Knowledge supplement

Reference:

Related Posts