1. A brief review of garbage collection

There are basically several garbage collection algorithms: mark-clean, mark-copy, and mark-clean. On this basis, we can add generations (new generation/old generation), and adopt different recovery algorithm for each generation to improve the overall allocation and recovery efficiency.

No matter which algorithm is used, marking is always a necessary step. Of course, how can you recycle the garbage if you don’t find it first?

The workflow of the garbage collector is as follows:

  1. Mark which objects are alive and which are garbage (recyclable);
  2. Reclaim (clean/copy/collate) and update references if any objects have been moved (copy/collate).

2. Three-color labeling

2.1 Basic Algorithm

To find out the surviving object, according to the reachability analysis, the traversal visit starts from GC Roots, and the reachable object is the surviving object (the final result: A/D/E/F/G reachability) :

The objects we encounter during the process of traversing the object graph are marked with the following three colors according to the condition “visited or not” :

  • White: not yet visited.
  • This object has been accessed, and all other objects referenced by this object have been accessed.
  • This object has been accessed, but other objects referenced by this object have not been accessed. After all access, it will be converted to black.

Suppose there are three sets of white, gray, and black (representing the color of the current object), and the traversal is as follows:

  1. Initially, all objects are in the white set;
  2. Move objects directly referenced by GC Roots to grey collections
  3. 3.1. Move all other objects referenced by this object to [grey collection]; 3.2. Move this object into the black set.
  4. Repeat Step 3 until the grey set is empty.
  5. After the end, the objects still in the [white collection] are unreachable by GC Roots and can be recycled.

Note: If the object is still white after the end of the tag, it means that the object is “not found” and is unlikely to be re-referenced.

When Stop The World (STW) is used, references between objects do not change and can be easily tagged.

When you need to support concurrent markup, that is, when the application thread continues to run during markup, references between objects may change, and multiple and missing markup situations can occur.

2.2 Multi-standard – floating garbage

Suppose the application executes objD.fieldE = null (D > E is broken) :

After this point, the object E/F/G is “supposed” to be reclaimed. However, since E is greyed, it will continue to be traversed as if it were a living object. The end result is that this part of the object is still marked as alive, meaning that this round of GC does not reclaim this part of memory.

The part of memory that should have been reclaimed but was not, is called “floating garbage.” Floating garbage does not affect the correctness of the application, it just has to wait until the next round of garbage collection to be cleared.

In addition, for the new objects after the start of concurrent marking, the usual practice is to directly treat all as black, and the epicycle will not be cleared. This part of the object may become garbage during the period, which is also considered part of floating garbage.

2.3 Missed Marker – Read/write barrier

Suppose the GC thread has already traversed E (gray), and the application thread executes first:

var G = objE.fieldG; objE.fieldG = null; // objd. fieldG = G; // Black D refers to white GCopy the code

At this point, the GC thread is switched back to running, because E has no reference to G, so G is not put into the grey set; Although D rereferences G, it will not redo the traversal process because D is already black.

The final result is that G will remain in the white set and will be removed as garbage. This directly affects the correctness of the application and is unacceptable.

It is not difficult to analyze that the missing mark can occur only when the following two conditions are met:

  1. The gray object breaks the reference (direct or indirect) to the white object; That is, the reference to the original member variable of the gray object has changed.
  2. The black object rereferences the white object; The black object member variable has a new reference.

From a code perspective:

var G = objE.fieldG; Obje. fieldG = null; // 1. // 2. Write objd. fieldG = G; / / 3. WriteCopy the code
  1. Read the reference to the fieldG member variable of object E, that is, object G;
  2. Object E writes null to its member variable fieldG.
  3. Object D writes object G to its member variable fieldG;

All we have to do is fiddle with any of the above three steps, record the object G, and then iterate over it as a gray object. For example, put it into a particular collection and wait until the original GC Roots have traversed (concurrent marking) and then the objects of that collection have traversed (re-marking).

Resignations require STW, because if the application keeps running, the collection might keep adding new objects and never run out. Of course, concurrent tagging also allows most of the collection to run first, thus shortening the time to re-tag STW, which is an optimization problem.

Write barriers are used to intercept the second and third steps; The read barrier is the first step in interception. The purpose of their interception is simple: to record the object G before and after the read and write.

3. Write barriers

When you assign a value to a member variable of an object, the underlying code looks something like this:

/** * @param field @param field @param field @param field @param field @param field @param field @param field Null */ void oop_field_store(oop* field, oop new_value) {*field = new_value; // Assignment operation}Copy the code

A write barrier is a process that is added before and after an assignment (see AOP), and a read barrier is similar.

void oop_field_store(oop* field, oop new_value) { pre_write_barrier(field); // Write barrier - pre-write operation *field = new_value; post_write_barrier(field, value); // Write barrier - after write operation}Copy the code

3.1 Write Barrier + SATB

When references to member variables of object E change (obje.fieldg = null;) We can use the write barrier to record the reference object G to E’s original member variable:

void pre_write_barrier(oop* field) { oop old_value = *field; The remark_set.add(old_value); // Record the original reference object}Copy the code

When the reference to the original member variable changes, record the original reference object.

The idea is to try to keep The object graph At The Beginning, The Snapshot At The Beginning (SATB), so that when The GC Roots are determined At a certain point, The object graph At that point is already determined. For example, if D refers to G at that time, then subsequent tokens should follow the object graph at that time (D refers to G). If there is a change in the period, you can record it to ensure that the markup remains in the original view.

SATB breaks condition 1: [the gray object disconnects the white object], thus ensuring that no tags are missed.

3.1 Write Barrier + incremental update

When the reference to the member variable of object D changes (objd. fieldG = G;) We can use the write barrier to record D’s new member variable reference object G:

void post_write_barrier(oop* field, oop new_value) { if($gc_phase == GC_CONCURRENT_MARK && ! isMarkd(field)) { remark_set.add(new_value); // Record the newly referenced object}}Copy the code

When a new reference is inserted, the new reference object is recorded.

The idea is that instead of requiring the original snapshot to be retained, new references are logged for traversal, known as Incremental updates.

The incremental update breaks condition two: [the black object re-references the white object], thus ensuring that no markup is missed.

4. Read barrier

oop oop_field_load(oop* field) { pre_load_barrier(field); // Return *field; }Copy the code

The read barrier is directed at the first step: var G = obje.fieldg; When reading member variables, record them uniformly:

void pre_load_barrier(oop* field, oop old_value) { if($gc_phase == GC_CONCURRENT_MARK && ! isMarkd(field)) { oop old_value = *field; remark_set.add(old_value); // Record the object read}}Copy the code

It’s conservative, but it’s safe. In condition 2 [the black object rereferences the white object], the rereference is made only if the white object is obtained, and the read barrier is in effect.

5. Three-color labeling and modern garbage collectors

Accessibility of modern track type (analysis) of the garbage collector almost all learning algorithm thought from three color marker, although different implementation: such as white/black collection generally won’t appear, but there are other reflected color), gray collection can be through the stack/queue/cache in the form of a log and implementation, traversal method can be a breadth traversal/depth and so on.

For the read and write barrier, taking Java HotSpot VM as an example, the processing scheme for the missing mark when concurrent marking is as follows:

  • CMS: write barrier + incremental update
  • G1: Write barrier + SATB
  • ZGC: Read barrier