GC

In Java, multiple implementation classes of an interface may require different amounts of memory, and multiple branches of a method may require different amounts of memory. We only know what objects are created while the program is running, and this memory allocation is dynamic. The program counter, virtual machine stack, and local method stack are all allocated and reclaimed, so the garbage collector focuses on the Java heap.

Determines whether the object is alive

Reference counting method

Add a reference counter to the object that increments every time a reference is made to it; When a reference is invalid, the counter is decayed by 1; An object whose counter is 0 at any point in time cannot be used again.

Reference counting method has the advantages of simple implementation and high judgment efficiency. COM(Component Object Model) and Python use it for memory management. The disadvantage is that it is difficult to solve the problem of cyclic references between objects

Cross reference

Suppose objects A and B each have A member variable object

	A a=new A();
	B b=new B();
	a.object=B;
	b.object=A;
	a=null;
	b=null;
Copy the code

Neither object can actually be accessed again, but because they reference each other, the reference count is not zero and cannot be reclaimed. To solve this problem, Java uses the following reachability analysis algorithm

Accessibility analysis algorithm

Reachability Analysis, the basic idea (essentially graph theory) is to search down from a series of objects called “GC Roots” as the starting point. The path searched is called the reference chain. When GC Roots reaches an object that is unreachable, Prove that this object is not available. Objects available as GC Roots include

  1. The object referenced in the virtual machine stack (the local variable table in the stack frame)
  2. The object referenced by the class static property in the method area
  3. The object referenced by the constant in the method area
  4. Objects referenced by JNI(Java Native Interface) in the local method stack

reference

If the value stored in a reference data type represents the starting address of another chunk of memory, the chunk is said to represent a reference

This definition is too simple to be of practical value. The concept of references has been expanded since JDK1.2

  • Strong references: References such as “Object obj = new Object()” are common in program code. As long as strong references exist, the garbage collector will never reclaim the referenced Object.
  • Soft references: Describe objects that are useful but not necessary. Clean up only when the system is out of memory. The corresponding class is SoftReference
  • Weak reference: Describes non-essential objects. Whether memory is sufficient or not is cleared. Corresponding class WeakReference
  • Virtual references: The weakest reference relationship. The existence or absence of a virtual reference does not affect the lifetime of an object and cannot be used to obtain an object instance. The only purpose of setting a virtual reference for an object is to receive a system notification when the object is reclaimed by the collector. The corresponding class is PhantomReference

Strength of citation: Strong citation soft citation Weak citation virtual citation

Finalize – A way for unreachable objects to escape collection

An object must be marked at least twice to be truly reclaimed

  1. Mark and filter the objects that cannot be reached through the accessibility analysis for the first time. If the object overwrites the Finalize method and the Finalize method has not been called by the virtual machine, it is necessary to execute the Finalize method; otherwise, it is not necessary to execute the Finalize method and clean it directly
  2. If the object is determined to be necessary to execute the Finalize method, the object will be placed ina queue called f-Query, which will be executed later by the lower-priority Finalizer created by the virtual machine (for security, Will trigger the Finalize method of this object but does not promise to wait for it to run.) It, if an object in Finalize method re-connects with any object in the reference chain, the collection of “to be collected” will be removed in the second tag, otherwise it will be collected.

Recovery method area

The recovery efficiency of the method area is low and the conditions are harsh. So there is less recycling in general. The reclaim method area (or permanent generation) requires the virtual machine to have the ability to unload classes to ensure that the permanent generation does not overflow in scenarios where bytecode frameworks such as reflection, dynamic proxy, CGLib, dynamic JSP generation, and OSGi are frequently customized classloaders. Recycling conditions

  • All instances of the class have been reclaimed (there are no instances of the class in the Java heap)
  • The ClassLoader that loaded the class has been reclaimed
  • The java.lang.Class object corresponding to this Class is not referenced anywhere, and the methods of this Class cannot be accessed anywhere through reflection.

Garbage collection algorithm

There are mainly

  • Mark-clear algorithm
  • Replication algorithm
  • Mark-collation algorithm

### Mark – Clear the algorithm to mark the need to recycle the object, mark after the completion of the unified clean. Is less than

  1. The efficiency is not high
  2. Memory fragments

The copy algorithm divides the memory into two pieces of the same size. Each time one piece is used, the surviving objects are copied to the other piece when cleaning up, and the used memory space is cleared. Fixed a defect in the mark-clear algorithm, but it was too expensive to reduce memory by half. According to the study, the memory of the new generation is divided into one large Eden and two small Survivor instead of 1:1. Eden: Survivor= 8:1, and the memory utilization rate is 90%. Each time Eden and one Servivor block are used, the surviving objects in Eden and Survivor are transferred to the other Survivor block during cleanup.

Mark-collation algorithm

After marking, the surviving object is moved to one end, which can directly clean up the memory outside the end boundary and solve the memory fragmentation problem.

The generation collection algorithm divides the memory into several blocks according to the different life cycles of objects, generally divided into new generation and old age. For the new generation of features, the copy algorithm is adopted, and for the old age, the mark-cleaning or mark-sorting algorithm is adopted.

The realization of the HotSpot

Stop The World

Reachability analysis must be done in a snapshot that ensures consistency, that is, at least when enumerating GC Roots, all Java execution threads must be stopped, hence The term “Stop The World”

The implementation details of the tag

OopMap

The virtual machine knows directly from OopMap where object references are stored

safer

OopMap is only generated at the safe point, and the program execution cannot GC until the safe point is reached. Safety point selection standard is a program “whether it has the characteristics of the program for a long time” for the standard selection. The most obvious feature is instruction sequence reuse, such as method call, loop jump, exception jump and so on. There are two options for allowing all threads (excluding those making JNI calls) to “run” to the nearest safe point and then pause when GC occurs

  1. Preemptive interrupt
  2. Active interrupt

The current mainstream is active interrupts

The safety area

Within a code snippet, the reference relationship does not change, and it is safe to start GC anywhere in this region. Can be seen as a safe place to expand

Garbage collector

There is no single best collector, let alone a one-size-fits-all collector, and we have chosen the one that is most appropriate for our specific application. At the top of the diagram are collectors working in the new generation and at the bottom are collectors working in the old generation, and the line indicates that the two can work together. G1 doesn’t differentiate much between the new generation and the old generation, and can work independently.

Cenozoic collector

Serial

The replication algorithm single thread collector, The oldest, is still The default generation collector for virtual machines running in Client mode. It stops The World and pauses for a long time. Advantages are simple and efficient (no overhead of thread interaction, focus on garbage collection)

ParNew

The multithreaded version of Serial, which is basically the same as Serial but only works with CMS collection, is the slave and preferred next-generation collector running in Server mode

Parallel Scavenge

A new generation of collectors for replication algorithms, focused on achieving a manageable throughput (throughput = run user code time /(run user code time +GC time), are primarily suitable for tasks that perform operations in the background without much interaction

Old age collector

Serial Old

The mark-collation algorithm Serial Old is an older version of the Serial collector. Single-thread: This mode is used by VMS in Client mode using the mark-tidy algorithm. The two main uses are

  1. JDK1.5 and earlier versions are used in conjunction with the Parallel insane
  2. As a backup to the CMS collector

Parallel Old

The Mark-collation algorithm Parallel Old (JDK1.6) is an older version of the Parallel Insane. Use multithreading and mark-collation algorithms, previously if the new generation was using the Parallel Insane, the older generation could only use Serial Old. Parallel Old does not work with CMS

CMS

CMS(Concurrent Mark Sweep), Concurrent collection, low pause, a collector whose goal is to obtain the shortest collection pause time. Suitable for Java Web applications. CMS is based on ** “mark-clear” ** algorithm, and the specific steps are as follows

  1. Initial tag (STW, tag objects that GC Roots can associate with, fast)
  2. Concurrent markup (concurrent with user processes, GC Roots Tracing)
  3. Relabeling (GC thread parallelism, STW, correction of object records for concurrent marking phase changes)
  4. Concurrent cleanup (concurrent with user processes, cleanup) time consuming: initial mark “re-mark” concurrent cleanup

Disadvantages:

  1. The number of threads started by default is **(number of cpus +3)/4**.An incremental concurrency collector (I-CMS) has been developed for this purpose, but the effect is deprecated
  2. Unable to handle floating garbage (garbage generated by user threads during concurrent cleanup, only saved for the next cleanup), CMS is activated when 68% of user threads are used in the old days, and in JDK1.6 the startup threshold is raised to 92%. If a Concurrent Failure occurs due to insufficient memory during CMS operation, the VM starts the backup plan – temporarily enabling the Serial Old.
  3. Space debris, because CMS is based on “mark-clean”. The [^Full GC] is triggered once when large objects are allocated and can be set to turn on the space collation option

[^Full GC]: Old GC, also known as Major GC. The corresponding Cenozoic GC is called Minor GC

G1

The G1(Garbage First) collector technology is one of the most advanced achievements. The characteristics of

  • Parallelism and concurrency
  • Generational collection
  • Space to sort out
  • Predictable pauses that allow the user to specify that no more than N ms should be spent on GC in a time segment of M ms length. This is almost characteristic of the real-time Java(RTSJ) collector. ##### Memory partitioning The G1 collector divides the entire Java heap into independent regions of equal size. Although there are concepts of new generation and old generation, they are no longer physically isolated; they are all part regions (not contiguous).
algorithm

The G1 collector is based on the ** mark-collation ** algorithm as a whole, and the local (two regions) is based on the copy algorithm.

##### Implementation of predictable pauses G1 systematically avoids area-wide garbage collection across the entire Java heap. G1 tracks the value of garbage accumulation in each Region and maintains a priority list in the background. The Region with the highest value is collected first based on the allowed time. ##### How to avoid a full heap scan (applicable to various collectors) Virtual machines use Remember Set to avoid a full heap scan. Each Region in G1 has a corresponding Remember Set. When the VM discovery program writes data of the Reference type, a Write Varrier operation is generated to check whether the objects referenced by Reference are in different regions. If yes, relevant reference information is recorded in Remember Set of the Region to which the referenced object belongs through CardTable. When collecting memory, add Remember Set to the enumeration range of the GC root node to ensure that no full heap scan will be missed.

Operation steps

  1. Initial Flag (STW)
  2. Concurrent markup (executed concurrently with the user thread)
  3. The final tag (GC thread executes in parallel. Merge concurrent marking phase records in thread Remembered Set Logs into Remembered Set.
  4. Filter tags (which are executed in parallel by the GC thread, or (in the future) concurrently with the user program. Make a recovery plan based on the expected downtime)

# # # GC logs

33.125: [GC [DefNew: 3324K -> 152K](3712k,0.0025925 secs] 3323K -> 152K (11904k),0.0031680 secs]Copy the code

Model for

GC occurs at :[Pause type of garbage collection :[region where GC occurs: Used capacity of the memory region before GC -> Used capacity of the memory region after GC (total capacity of the memory region), time for GC of the memory region] Used capacity of the Java heap before GC -> Used capacity of the Java heap after GC (total capacity of the Java heap), time for GC of the Java heap]Copy the code

Type of pause for garbage collection

  • GC
  • Full GC, STW occurs
  • [Full GC(System)], STW occurs

The area where the GC occurs is closely related to the collector

  • DefNew: Default New Generation. Serial collector New generation
  • ParNew: Parallel New Generation. ParNew collector
  • PSYoungGen: Parallel Scavenge. Parallel Scavenge collector……

Memory allocation and reclamation policies

  1. Objects are allocated in the Eden area of the new generation first, and a Minor GC is initiated when space is insufficient.
  2. Big object goes straight to the old age
  3. The age of a long-lived object is increased by one each time the object undergoes a Minor GC.
  4. If the total size of all objects of the same age in the Survivor space is greater than half of the size in the Survivor space, the object with an age greater than or equal to this age can enter the old age directly
  5. Space allocation guarantee. Before Minor GC occurs, the virtual machine checks whether the maximum available contiguous space of the old generation is greater than the total space of all objects of the new generation. If this is true, Minor GC ensures that it is safe. If yes, check whether the maximum available continuous space in the old age is greater than the average size of the objects in the promoted old age. If Minor GC is attempted, otherwise Full GC is performed.