“This is the 18th day of my participation in the First Challenge 2022. For details: First Challenge 2022.”


When it comes to garbage collection in Java, I believe many of you are like me. Your first reaction is that you will have to ask this question in an interview. If you have not recited some knowledge of GC algorithm and collector, you will not dare to say that you have recited eight essay before going out. It is really a bit embarrassing to say that the actual use of this knowledge in the work of the scene is really not much, and this thing to learn is also very boring, but the interviewer just love to ask, what can we do?

Hydra has sacrificed his weekend to create some giFs to help you understand the garbage collection algorithm. Without further ado, let’s start with the basics of how to determine if an object should be reclaimed.

Determine object survival

The basic purpose of garbage collection is to use some algorithms for memory management, so as to effectively use memory space. Before garbage collection, it is necessary to determine the survival of objects. There are two algorithms for determining the survival of objects in the JVM, which are described below.

1. Reference counting algorithm

Adds a reference counter to the object that increments every time a reference is made to it and decays every time a reference is invalidated. When the counter is 0, the current object can be reclaimed.

The principle of this method is simple and it is efficient to judge, but there are two problems:

  • Every time an object in the heap is referenced or cleared, the counter must be added and subtracted, resulting in performance loss
  • When two objects refer to each other, the counter is never zero. That is, even if the two objects are no longer used by the program, there is still no way to recycle them.
public void reference(a){
  A a = new A();
  B b = new B();
  a.instance = b;
  b.instance = a;    
}
Copy the code

The process of reference counting is shown in the following figure:

As you can see, in the method, after the completion of execution stack in the reference is released, but left the two objects in the heap memory circular reference, finally led to the two instance of the reference count is zero, finally the two objects is memory will always can not get release, it is also for this defect, make reference counting algorithm is not the actual application in the process of gc.

2. Accessibility analysis algorithm

The reachability analysis algorithm is the garbage finding algorithm used by the JVM by default. It is important to note that while garbage finding is said, the reachability analysis algorithm actually looks for objects that are still alive. The reason for this design is that it is more complex and time-consuming to implement if you directly look for garbage objects that are not referenced, and in turn, it is less time-consuming to mark surviving objects.

The basic idea of the accessibility analysis algorithm is to take a series of objects called GC Roots as the starting point and search downward from these nodes. The search path is called reference chain. When an object is not connected to GC Roots by any reference chain, it is proved that the object is no longer alive and can be recycled as garbage.

In Java, there are several objects that can be used as GC Roots:

  • The object referenced in the virtual machine stack (the local variation table of stack frames)
  • The object referenced by the static property in the method area
  • The object referenced by the constant in the method area
  • In the local method stack JNI (nativeMethod)
  • References within the JVM, such as Class objects corresponding to basic data types, some resident exception objects, and system Class loaders
  • Be synchronized locksynchronizedHolds a reference to an object
  • That reflects the internal conditions of the JVMJMXBean,JVMTI Registered callback local code cache, etc
  • In addition, there are some temporary GC Roots, because garbage collection is mostly generational collection and local collection. When considering objects referenced across generations or regions, this part of the associated objects need to be added to GC Roots to ensure accuracy

Among them more important, at the same time mentioned more or the front 4 kinds, other simple understanding can be. Now that you know how the JVM looks for garbage objects, let’s take a look at how different garbage collection algorithms perform.

Garbage collection algorithm

1. Mark-clear algorithm

The tag cleanup algorithm is a very basic garbage collection algorithm that triggers STW (Stop the World) when the effective memory space in the heap runs out, and then garbage collection is performed in two phases: tag and cleanup:

  • Mark: Scan from the nodes of GC Roots, mark all surviving objects and record them as reachable objects
  • Cleanup: The entire heap memory space is scanned and if an object is not marked as reachable, it is reclaimed

A quick look at the two-phase process is shown below:

But there are several problems with this algorithm:

  • STW is generated during GC and stops the entire application, resulting in a poor user experience
  • Both the mark phase, which requires scanning from the root collection, and the sweep phase, which requires traversing all objects in the heap, are inefficient
  • Only non-living objects are processed, and a large number of discrete memory fragments are generated after removal. Later, when the program needs to allocate large objects at runtime, it cannot find enough contiguous memory, triggering a new garbage collection action

In addition, the JVM does not actually traverse the garbage object and delete all of its internal data. Instead, the JVM saves the first and last addresses of the garbage object and allocates them directly to the address list when memory is allocated again, which improves the efficiency of some token cleaning algorithms.

2. Replication algorithm

Replication algorithms are mainly used in the new generation, which divide memory into two pieces of the same size and use only one piece at a time. At any point in time, all dynamically allocated objects can only be allocated in one memory space, and the other memory space is free. The replication algorithm can be divided into two steps:

  • When one block of memory runs out of available memory, the JVM stops running the application, starts the GC thread of the replication algorithm, and copies the surviving objects to another free block of memory. The copied objects are arranged strictly by memory address, and the GC thread updates the memory reference addresses of the living objects to the new memory address
  • After the replication is completed, the used space is cleaned up at a time, so that the used memory space and free memory space are swapped, so that each memory reclamation is half of the memory space reclamation

The following figure shows the execution process of the replication algorithm:

The advantage of the copy algorithm is that it makes up for the memory fragmentation in the mark clearing algorithm, but it also has some problems:

  • Only half of the memory is used, so memory utilization is low, resulting in waste
  • If the object survival rate is high, many objects need to be copied and their application addresses updated, which can take a long time

As can be seen from the above shortcomings, if the replication algorithm is needed, there is a premise that the survival rate of the object should be relatively low. Therefore, the replication algorithm is more used in the new generation when more objects are born and die.

3. Mark-tidy algorithm

Tag sorting algorithm and tag clearing algorithm are very similar, mainly used in the old era. It can be divided into the following two steps:

  • Marking: Like the mark clearing algorithm, the object is marked first, and the surviving object is marked by scanning the GC Roots node
  • Collation: Move all living objects to one end of the free space, sort them by memory address, update the corresponding reference Pointers, and then clean up all memory space except the end memory address

The execution process of the tag collation algorithm is shown in the figure below:

It can be seen that the tag collation algorithm improves the previous two algorithms and makes up for their shortcomings to some extent:

  • Compared with the mark clearing algorithm, the algorithm makes up for the shortcomings of memory space fragmentation
  • Compared with the copy algorithm, it makes up for the disadvantage of wasting half of the memory space

However, the tag collation algorithm also has its disadvantages. On the one hand, it needs to mark all living objects, and on the other hand, it also adds the operation of moving objects and updating reference addresses, so the tag collation algorithm has higher cost.

4. Generational collection algorithm

In fact, the garbage collector in Java is not the only garbage collection algorithm that is used; most of the current generation collection algorithms are used. The JVM typically divides the heap into chunks based on the lifetime of the object. Typically, the heap is divided into new generation and old generation, and the best garbage collection algorithm is selected based on the characteristics of each generation. The main ideas are as follows:

  • In the new generation, a large number of objects die in each collection, so you can choose a replication algorithm that only copies a few objects and changes references to accomplish garbage collection
  • In the old age, the object survival rate is relatively high, the use of replication algorithm can not improve the performance and efficiency. In addition, there is no additional space for it to be allocated as a guarantee, so a tag sweep or tag collation algorithm is selected for garbage collection

Take a brief look at the main application areas of various algorithms through the figure:

As for why to choose a certain algorithm in a certain area, it is closely related to the characteristics of the three algorithms, and then compare from three dimensions:

  • Execution efficiency: In terms of the time complexity of the algorithm, the replication algorithm is the best, followed by the tag clearing, and the tag finishing is the lowest
  • Memory utilization: the mark collation algorithm and the mark clearing algorithm are higher, while the copy algorithm is the worst
  • Memory tidiness: the copy algorithm and the tag collation algorithm are neat, and the tag clearing algorithm is the worst

Despite the differences, in addition to the need to mark each other, one thing in common is the need for the STW to suspend all worker threads when the GC thread starts working.

conclusion

In this article, we started with the basic question of garbage collection: what objects can be collected as garbage? The JVM solves this key problem through the reachability analysis algorithm, and on the basis of which a variety of commonly used garbage collection algorithms have been derived, different algorithms have their own advantages and disadvantages, according to their characteristics have been applied to various generations.

Although this article chatter so much, but these are the basis of knowledge, if you want a thorough grasp of the JVM garbage collection, follow-up and the garbage collector, memory allocation, and so on a lot of knowledge to be understood, but we will be introduced here today, hope that through this article illustrations, can help you better understand the garbage collection algorithms.

The last

If you feel helpful, you can click a “like” ah, thank you very much ~

Nongcanshang, an interesting, in-depth and direct public account that loves sharing, will talk to you about technology. Welcome to Hydra as a “like” friend.