There are three things that need to be done for garbage collection:

  1. What memory needs to be reclaimed?
  2. When is it recycled?
  3. How to recycle?

Which memory needs to be reclaimed

What is a reference

Prior to JDK1.2, the definition of a reference in Java was that if the data stored in the reference type data represented the starting address of another block of memory, that block of memory represented a reference. Weakness: Too narrow a definition to describe something that is “tasteless to eat and regrettable to throw away.”

After JDK1.2, Java classifies references as strong, soft, weak, and virtual.

  • Strong References Strong references are common in program code, such as “Object obj = new Object()”. As long as a strong reference exists, the garbage collector will never reclaim the referenced Object.
  • Soft References Soft references are used to describe objects that are still useful but not necessary. Objects associated with soft references are listed in the collection scope for a second collection before the system is about to experience an out-of-memory exception.
  • Weak references Weak references are also used to describe non-essential objects, but they are weaker than soft references, and the object associated with a weak reference will only survive until the next garbage collection occurs.
  • Phantom reference

    Phantom references, also known as ghost or phantom references, are the weakest type of reference relationship. The existence of a dummy reference to an object has no effect on its lifetime, and it is not possible to obtain an instance of an object by a dummy reference. The only purpose of setting a virtual reference association for an object is to receive a system notification when the object is collected by the collector.

Reference counting

Reference counting is the process of adding a reference counter to an object. Each time a reference is made to it, the counter is incremented by 1. When the reference fails, the value of the counter is decrement by 1. Advantages: simple implementation, high judgment efficiency, in most cases when a good algorithm. Cons: It’s hard to solve the problem of objects having circular references to each other. Therefore, reference counting is not used to manage memory in mainstream Java virtual machines.

Accessibility analysis algorithm

The basic idea of this algorithm is to start with a series of objects called “GC Roots” and search down from these points. The search path is called reference chain. When an object is not connected to any reference chain from GC Roots, it is proved that the object is not available. The diagram below:



Objects 5, 6, and 7 are related to each other, but they are not reachable to GC Roots, so they will be determined to be recyclable objects.

Objects that can be used as GC Roots in Java include the following:

  • Objects referenced in the virtual stack (the local variable scale in the stack frame);
  • Object referenced by a class static property in the method area;
  • Object referenced by a constant in the method area;
  • Object referenced by JNI in the local method stack.

Recovery method area

Recycle the method area, or the permanent generation in the HotSpot virtual machine. Perpetant-generation garbage collection mainly collects two parts: deprecated constants and useless classes. Recovering deprecated constants is very similar to recycling objects in the Java heap. For example, if a String “ABC” has entered the constant pool, but the current system does not have a String called “ABC”, that is, there are no strings referring to the “ABC” constant in the constant pool, and there are no references to the literal anywhere else. If a memory collection occurs and it is necessary, The “ABC” constant is cleared out of the constant pool by the system. To determine whether a class is a “useless class”, three conditions must be met:

  1. All instances of the class have been collected, meaning there are no instances of the class in the Java heap;
  2. The ClassLoader that loaded the class has been reclaimed;
  3. The java.lang.Class object to which this Class corresponds is not referenced anywhere, and the methods of this Class cannot be accessed by reflection anywhere.

How to recycle

Mark-clean algorithm

The algorithm is divided into two stages: “mark” and “clear”. All the objects that need to be recycled are marked first, and all the tagged objects are collected uniformly after the tagging is completed. The schematic diagram is as follows:



This algorithm has two main shortcomings:

  1. The efficiency problem is that the efficiency of both marking and cleaning processes is not high.
  2. Space-related issues. A large number of discrete memory fragments will be generated after the tag is cleared. Too many spatial fragments may cause another garbage collection to be triggered prematurely when a large object needs to be allocated during the program’s run and not enough contigous memory will be found.

Replication algorithm

Replication algorithms divide the available memory into two equal pieces by volume and use only one of them at a time. When you run out of memory, you copy the surviving objects onto the other memory block, and then clean up the used memory space once. The schematic diagram is as follows:



The replication algorithm retrieves the memory of the entire half region every time. When allocating memory, it does not need to consider the complicated situation of memory fragmentation. It only needs to move the top pointer of the heap to allocate memory in order.The implementation is simple and the operation is efficient. However, when the survival rate of the object is high, more copying operations are needed, and the efficiency will be lower. In addition, this algorithm isReduce memory to half its original sizeThat’s a little too high a price to pay.

Mark-collate algorithm

The marking process of this algorithm is still the same as that of the mark-and-sweep algorithm, but instead of cleaning up the recyclable objects directly, the next step is to move all the surviving objects to one end and then directly clean up the memory beyond the end boundary. The schematic diagram is as follows:

Generational collection algorithm

Generational collection algorithm is generally the Java heap is divided into the new generation and the old age, and then according to the characteristics of each age to adopt the most appropriate collection algorithm. In the new generation, a large number of objects are found dead and only a small number are found alive in each garbage collection, so the replication algorithm is selected, and the collection can be completed only by paying the replication cost of a small number of surviving objects. In the old days, because the survival rate of the object was high, there was no extra space to allocate it for guarantee, so it had to be recycled using “mark-clean” or “mark-tidy” algorithms.