Three-color labeling

The main purpose of GC garbage collector is to realize memory collection. The main two steps in this process are: memory marking and memory collection.

Introduction to trichromatic notation

Three-color notation, mainly for efficient marking of memory blocks that can be reclaimed.

Tri-color Marking is used as a tool to assist derivation. Objects encountered in the process of traversing the object graph are marked with the following three colors according to the condition of “whether or not they have been accessed” :

  • White: indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the reachability analysis, all the objects are white. If the objects are still white at the end of the analysis, it means unreachable.
  • Black: indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned. The black object delegate has been scanned, it is safe to live, if there is another object reference to the black object, it does not need to be scanned again. It is not possible for a black pair to point directly (without passing the gray object) to a white object.
  • Gray: Indicates that the object has been accessed by the garbage collector, but there is at least one reference on the object that has not been scanned.

Three color marking process

Marking process:

  1. At the start of GC concurrency, all objects are white;
  2. Mark all objects directly applied with GC Roots as grey sets;
  3. If the object in the gray set does not have a child reference, it is put in the black set; if there is a child reference object, all its child reference objects are stored in the gray set, and the current object is put in the gray set
  4. Follow this step 3, and so on, until all the objects in the gray set turn black, the epicycle mark is complete, and the objects in the white set are called unreachable objects, namely garbage objects.
  5. After the tag is finished, the white object is unreachable by GC Roots and can be garbage collected.

mislabeled

What is a false marker? When the following two conditions are met at the same time, a false marker is generated:

  1. The assigner inserts one or more references to the black object to the white object
  2. The assigner removes all direct or indirect references from the gray object to the white object

The solution of false mark

You only need to break one of these two conditions to solve The problem. Incremental Update and Snapshot At The Beginning (STAB) are two different solutions.

Incremental updating

The incremental update breaks the first condition. When the black object inserts a new reference to the white object, the newly inserted reference is recorded. After the concurrent scan is complete, the black object in the recorded reference is re-scanned. This can be simplified to mean that the black object becomes gray once a new reference to the white object is inserted.

Original Snapshot (STAB)

The second condition is broken by the original snapshot. When the gray object is about to delete the reference to the white object, the reference to be deleted is recorded. After the concurrent scan, the gray object in the recorded reference is taken as the root and rescan again. This can also be simplified to mean that a snapshot of the object graph is searched at the moment the scan started, regardless of whether the reference relationship is deleted.

Missing bids and multiple bids

In fact, there will be two cases for the subdivision of the wrong mark, which are: missing mark and multiple mark

Multi-label – floating garbage

If the tag executes at this point E executes object.E = null

At this point, E/F/G could theoretically be recycled. But since E has gone gray, it’s just going to keep going. The end result is that they are not marked as garbage objects and live in the epicycle tag.

The garbage that should be collected in this round is not collected, and this part is called “floating garbage”. Floating garbage does not affect the correctness of the program, and this “garbage” is only cleaned up the next time garbage collection is triggered.

Also, new objects generated during the marking process are marked black by default, but may become “garbage” during the marking process. This is part of floating garbage.

Missed marker – read/write barrier

Store Barrier

When you assign a value to a member variable of an object, the underlying code looks something like this:

/** * @param new_value = @param new_value = @param new_value = @param new_value Null */ void oop_field_store(oop* field, oop new_value) {*fieild = new_value // Assignment operation}Copy the code

The write barrier is the addition of processing logic (aOP-like) before and after assignment.

void oop_field_store(oop* field, oop new_value) { pre_write_barrier(field); *fieild = new_value // Pre_write_barrier (field); // Write barrier - write back barrier}Copy the code

Write barrier + SATB

When references to member variables of object E change (obje.fieldg = null;) We can use the write barrier to record the reference object G to E’s original member variable:

void pre_write_barrier(oop* field) { oop old_value = *field; The remark_set.add(old_value); // Record the original reference object}Copy the code

When the reference to the original member variable changes, record the original reference object. Try to keep The object graph At The Beginning, The Snapshot At The Beginning (SATB), when The GC Roots At a certain point are determined, The object graph At that point will be determined. For example, if D refers to G at that time, then subsequent tokens should follow the object graph at that time (D refers to G). If there is a change in the period, you can record it to ensure that the markup remains in the original view.

It is worth mentioning that the operation of scanning all GC Roots (i.e., the initial marker) usually requires STW, otherwise it may never be done, as new GC Roots may be added during concurrency.

SATB breaks condition 1: [the gray object disconnects the white object], thus ensuring that no tags are missed.

A small optimization: If you are not in the concurrent markup phase of garbage collection, or have already been marked, there is no need to record any more, so add a simple judgment:

Void pre_write_barrier(oop* field) {if($gc_phase == GC_CONCURRENT_MARK &&! isMarkd(field)) { oop old_value = *field; The remark_set.add(old_value); // Record the original reference object}}Copy the code

Write barrier + incremental update

When the reference to the member variable of object D changes (objd. fieldG = G;) We can use the write barrier to record D’s new member variable reference object G:

void post_write_barrier(oop* field, oop new_value) { if($gc_phase == GC_CONCURRENT_MARK && ! isMarkd(field)) { remark_set.add(new_value); // Record the newly referenced object}}Copy the code

The idea is that instead of keeping the original snapshot, new references are recorded for traversal, or Incremental updates.

The incremental update breaks condition two: [the black object re-references the white object], thus ensuring that no markup is missed.

Load Barrier

oop oop_field_load(oop* field) { pre_load_barrier(field); // Return *field; }Copy the code

Var objF = object.fieldg; .

void pre_load_barrier(oop* field, oop old_value) { if($gc_phase == GC_CONCURRENT_MARK && ! isMarkd(field)) { oop old_value = *field; remark_set.add(old_value); // Record the object read}}Copy the code

It’s conservative, but it’s safe. In condition 2 [the black object rereferences the white object], the rereference is made only if the white object is obtained, and the read barrier is in effect.

Three-color labeling and garbage collector

Incremental update: CMS original snapshot (STAB) : G1, Shenandoah

Reference documentation

  • www.jianshu.com/p/12544c0ad…
  • Hllvm-group.iteye.com/group/topic…
  • www.oracle.com/webfolder/t…
  • Tech.meituan.com/2016/09/23/…
  • Deeper understanding of the JVM Virtual Machine – 3rd edition. By Zhiming Zhou