Recently, I have been reading “Deep Understanding of the Java Virtual Machine” (3rd edition) by Zhou Zhiming. I read version 2 many years ago, when I was just starting out at work, and I read it every day at home whenever I had time. It was also an initiation of knowledge about the JVM. Time flies, but this book is still a classic, worth reading, worth reading several times. Today, I read the concurrent accessibility analysis of 3.4.6. It was very interesting. I will make some reading notes.

As you learned earlier, some of the ways to determine whether an object is recyclable, such as reference counting, are reachability analysis in the JVM. Starting with GC Roots and going all the way down, objects that can be guided are marked as non-recyclable and the rest are recyclable. Here actually has a suppressed premise, is in the process of traversing the downward from GC Roots, the object reference relationship can’t change, because if changed, may be a originally diagnosed as recyclable objects, and reference by GC Roots here, but in the end will be recycled, that lead to program errors are unacceptable. Since the process of continuously traversing down can be understood as a search, it cannot be done instantaneously, and the process increases almost linearly as the heap grows, triggering a garbage collection would make the part of the reachably time-consuming. How to optimize?

In fact, this problem comes to a key point, how to do the user thread and the tag thread in parallel, accurate marking? A tool called “three-color labeling” is cited here to help us analyze this problem.

What is “three-color marking”

White: indicates that the object has not been accessed by the garbage collector

Black: indicates that the object has been accessed by the garbage collector and that all references to the object have been scanned

Gray: Indicates that the object has been accessed by the garbage collector, but there is at least one reference on the object that has not been scanned

The labeling

1, initial state, 0 is GC root node, 1 refers to 2,2 refers to 3 and 4



2, node 0 finds its reference 1, because 1 and reference 2 are not traversed, 1 becomes gray



3, node 1 finds its reference 2, 2 is gray because 3 and 4 are not traversed; Now all references to 1 have been traversed and become black



4, point 2 finds its reference 3, because reference 4 has not been traversed, point 2 continues to be gray, and node 3 has no reference and becomes black



5, point 2 continues to find its reference 4, at this point 4 has no other reference, becomes black, point 2 also has no untraversed reference, becomes black

At this point, the entire traversal is complete.

Abnormal situation

So, what might happen if this markup process is concurrent with the user thread?

For example, after step 4 above, the reference of 2 to 3 is removed. After step 5, 3 should be reclaimed, but since it is marked black in step 4, it will not be reclaimed



Is that a problem? Of course, 3 should be recycled, but it is not recycled in time; However, from another point of view, this problem can be tolerated, because the traversal time is not too long on the whole, and the probability of such a situation is not large. Node 3 escaped this collection, and there will be another waiting for it, and there will be no overflow due to a small amount of overdue collection.

2. If, after step 4 above, 1 adds a reference to 4, 2 removes the reference to 4, as shown in the figure



This will cause 4 to stay white, and 1, because it’s already black, will not be traversed, so the reference of 1 to 4 will not be perceived, so 4 will always be white and will be recycled. This is the problem of “object messages”, which are referred to by GC Root, but are eventually collected, which is absolutely unacceptable. How to solve this problem?

In 1994, Wilson theoretically proved that the problem of “object disappearance” arises if and only if the following two conditions are met simultaneously, that is, an object that should have been black is mislabeled as white:

1. The assigner inserts one or more new references from the black object to the white object
2. The assigner removes all direct or indirect references from the gray object to the white object

Therefore, our solution is to cross out any of the above conditions, resulting in two solutions: incremental update and original snapshot.

Incremental updating

When the black-marked object references the white-marked object, the new reference is recorded. After the concurrent scan is over, the black object in the recorded reference relationship is scanned down again. This can be simplified to mean that the black object becomes gray once a new reference to the white object is inserted, which is essentially a re-labeling process.

The original snapshot

When the grey object wants to delete the reference to the white object, the reference to be deleted is recorded. After the concurrent scan is over, the grey object in the recorded reference is taken as the root and rescan again. This can also be simplified to mean that a snapshot of the object graph is searched at the moment the scan started, regardless of whether the reference relationship is deleted.

In fact, there will be a lot of questions about the understanding of the original snapshot at the beginning, many of which can be proved by disproofs. I have also thought of some, but due to space constraints, I will talk about them next time, so I will record them here today.