Moment For Technology

Javascript garbage collection (part 2)-V8 garbage collection

Posted on Aug. 6, 2022, 12:04 p.m. by Timothy Roberts-Hayes
Category: The front end Tag: javascript

The introduction begins with the story of Xiuzhen

Now that I've covered the mechanics of garbage collection and the core idea of tag removal, I'm ready to dive into the v8 garbage collection algorithm. Since it is the introduction of the algorithm class, it is naturally more boring, if you want to fully understand, you can collect down, read several times (! , _,! .

To ease the drudgery, I thought I'd start with an interesting topic. I believe that we have seen some xiuzhen fantasy novels, the transit and flying is common inside the bridge, now to tell you a story:

Initial continent there are a lot of common fix true in cultivate immortality, over time, the number is increasing, eventually reaching the limit, the mainland under the heaven must to balance, from selection of leaving it to the good man can pass the test, remove the rest of the build for poor people, to make space; The way of selection is as follows:

  • Move these people to the doomsday space, and then start a small day, until the end of the small day, and then send those who survived back to the continental space, did not pass through the people will be cleared, the body dead way elimination;
  • On and on, every time the number of people reached the upper limit of continental space, there would be a mini apocalypse,

Then it will have to spend several days of outstanding person, heaven will reward him rising to the higher and more broad in ancient China, on a higher level of practice, but we know that the road to cultivate immortality is the road to go, the more advanced nature also has a more advanced doom, ancient China, though more vast, is far better than the original, but every time, It will also trigger a greater catastrophe, clearing the continent of the practitioners.

Most of the practitioners have a short life and can't survive one or two minor disasters. Only a few practitioners can stand out and soar to the ancient continent.

That's all for now, and here's the business.

Division of heap structure

Before we talk about garbage collection, it's important to understand the v8 heap structure:

  1. New-space: This is where most objects are allocated. The new space is small and designed to be very fast for garbage collection, independent of any other space. In fact, this new space corresponds to the original continent mentioned above
  2. Old-pointer-space: Contains most objects that may have Pointers to other objects. After living in the new space for a while, most objects move there. (Special cases can also be ignored.)
  3. Old-data-space: Contains objects that contain only raw data (no Pointers to other objects). After a period of time in the new space, strings, boxed numbers, and unboxed double-precision arrays move here. The old pointer space and the old data space together are called the old space, corresponding to the ancient continent mentioned above.
  4. Large object space: This space contains objects larger than the size limit of other Spaces. Large objects are never moved by the garbage collector. (You can leave it at that)
  5. Code space: Code objects containing JIT instructions are allocated here. This is the only space with executable memory (although it is possible to allocate code in large object space, and the code is also executable. (You can leave it at that)

Here, I believe that some students can already correspond to part of the content, and then look down (mainly remember the first three Spaces, will always be used behind) :

Generational Collection

In most fictional Settings, the lives of ordinary practitioners are always short, and no one can stand out. In most programs, the life of object data is also short, and only a small number of data objects live long term. So the V8 engine was designed to recycle in different generations -- namely, the previous one: the big one and the small one. The small one happened frequently, and the cleaning of the new and ordinary ones only happened on the original continent. The great Apocalypse takes longer to clean up the reality of the distant continent, which takes place in different Spaces and performs garbage collection tasks together.

The overall coordination mechanism is as follows:

  1. Allocate new objects in the new space until the space is full, triggering a small collection mechanism;
  2. Objects that survive the next two reclaims are moved to the old space (allocated to the old pointer space or the old data space depending on the data characteristics).
  3. Major garbage collection is triggered when the old data space reaches a certain threshold (the specific parameters of this threshold are not concerned for now).

(Go back and read the previous story again to see if you got it all right!)

Let's take a look at each of these mechanisms.

A small recycling mechanism scavenge

The small-scale recycling mechanism, officially known as the Scavenge, occurs frequently and requires speed. The basic idea is derived from the famous Cheney algorithm, which follows:

  1. The new space(new-space)Are divided into two parts, named from space and to space; (The two Spaces will not be used at the same time)
  2. New objects are allocated in the to space until the to space is filled.
  3. This exchangefromSpace andtoSpace, that is, move all objects in the to space tofromSpace, and once I've done that,toSpace becomes empty,fromIs full;
  4. infromFrom space,rootStart looking for all the accessible objects (which I forgot to review in the last article) and move them totoSpace oroldSpace (something that has already been touched twice should soar), this step is actuallyv8The engine also does a bit of compacting, which means that the location of surviving objects is slightly concentrated to increase the locality of the cache and keep allocation fast and simple.
  5. emptyfromSpace (the rest of the filter is recyclable);

Readers who are new to this algorithm may be a little bit confused and wonder why they don't just clean up the garbage when the to space is full and keep the live objects instead of moving them around. The benefit of this design is that the to space is always used as the actual memory allocation space, and the FROM space is only used as a temporary container, that is, the disaster space, the two do not need to be used at the same time, so it is very clear. Officials also posted a pseudo-code:

def scavenge():
  swap(fromSpace, toSpace)
  allocationPtr = toSpace.bottom
  scanPtr = toSpace.bottom
  for i = 0..len(roots):
    root = roots[i]
    if inFromSpace(root):
      rootCopy = copyObject(allocationPtr, root)
      setForwardingAddress(root, rootCopy)
      roots[i] = rootCopy
  while scanPtr  allocationPtr:
    obj = object at scanPtr
    scanPtr += size(obj)
    n = sizeInWords(obj)
    for i = 0..n:
      if isPointer(obj[i]) and not inOldSpace(obj[i]):
        fromNeighbor = obj[i]
        if hasForwardingAddress(fromNeighbor):
          toNeighbor = getForwardingAddress(fromNeighbor)
        else:
          toNeighbor = copyObject(allocationPtr, fromNeighbor)
          setForwardingAddress(fromNeighbor, toNeighbor)
        obj[i] = toNeighbor
def copyObject(*allocationPtr, object):
  copy = *allocationPtr
  *allocationPtr += size(object)
  memcpy(copy, object, size(object))
  return copy
Copy the code

The code naturally looks boring, suitable for interested readers behind slowly study, the first reading can be skipped, because the idea has been finished, the lack of only some concrete implementation.

One small detail here is that the starting point for recycling is the root object, which is the global object and all objects it can access (including closures, etc.). So what if an object is only referenced by a data object that has been lifted into old space? The way we clean up, if we don't scan the old space to check for special cases like this, we're going to clean up the object by mistake; If we did that, it would be expensive, because we said that small clean-ups happen very often, so it's impossible to scan old space every time.

So, to solve this problem, the V8 engine maintains a buffer in memory whenever new(new-space)Space objects are old(old-space)Space object references the old space objectkeyWill be recorded, for example:

var user1 = {name: 'leo'};
/ /... Some code is omitted here, assuming that some time passes and user1 is moved into old space
use1.friend = {name: 'john'};
Copy the code

In this example, we assume that USE1 enters old space after some time, and then {name: 'John '} is allocated to the new space, and only user1 retains a reference to it. In this case, the memory location of the key, friend, is recorded in the buffer, which will be checked later to prevent accidental shooting. Although this requires some extra cost, it is a cost that must be paid in order to achieve the recovery effect, and the actual frequency of this scenario is not as high as imagined.

Large recovery mechanism

Scavenge is better for small areas of the insane because it requires swapping out memory and incurs a lot of overhead. It is fine to use the Scavenge avenge on older space, which is much larger.

The large collection mechanism refers to the mark-sweep method mentioned earlier, which is actually divided into mark-sweep and mark-compaction (the concept of compaction was also mentioned earlier). They all run in two phases:

  • tagPhase: Essentially a depth-first search: there are three color markers (white - initial state, black - checked state; Gray - to be checked);
    1. First, set all objects to white, and then, starting with the root object, mark all accessible objects to gray. Cache it in an array;
    2. It then iterates through the array, each time coloring the object to be iterated black and moving it out, and coloring its neighboring nodes gray and queuing it until the queue is empty
    3. Continue to check for gray objects, and if so continue to queue and loop until all accessible objects are black.

This section looks a little bit convoluted, but it's essentially a deep traversal of a digraph, which is pretty basic, so I won't draw a flow chart. Once marked, all objects are black and white, with the white being the cleanable garbage object.

  • ** Clearing (or compaction) ** stage: the clearing algorithm is relatively simple. According to the search results of the previous step, the memory address of the corresponding white-marked object is converted into free space; Compaction algorithm is relatively complicated, the core idea is to remove objects from scattered memory address, collective migration to a contiguous block of memory addresses, and other general selection is also a continuous memory block, and then make a copy of the object in the past, and leave a forwarding address in the source object, during the migration process, record the related the pointer to the location, upon completion of the entire migration, The update pointer points to a new location. If a certain memory address cannot be migrated because too many objects need to be migrated, the migration will continue until the next large reclamation period.

Good to here, the core content is basically introduced, you can take a break.

V8 engine optimization mechanism - incremental marking and delayed clearance

In the case of large amounts of real-time data processing, tag clearing (or compaction) can be time-consuming, so Google has proposed two improvements: incremental tagging and deferred clearing.

Incremental tag:

This is quite understandable, because above about tag removal algorithm could finish take a long time, this is * * * * need to be suspended during the program, so the v8 allows setting a threshold, such as each tag object of a certain number (e.g., 100), first back to execute a program, and then come back, This is to intersperse garbage collection during application execution to reduce the maximum pause time.

But there is a problem with this method: if I mark some objects the first time, but return to garbage collection, some objects have been modified!

For example, an object marked black adds a pointer to an already marked white object when it returns to execution. This will cause the white object to be killed (because it will actually become accessible later).

Very simple. Remember during the little recycle phase, the V8 engine maintained a buffer in memory to solve the problem of objects in the new space being referenced by objects in the old space? Similarly, he would record the pointer from black objects to white objects, and then turn the black objects back to gray, check again, and the problem would be solved.

Delay to clear:

This is also easy. After marking, the engine knows which objects can be removed, but it does not mean that it needs to remove all the garbage at the same time, so the engine chooses to clean up on demand, starting with the pages that need to be removed, and then completing the entire garbage collection cycle.

conclusion

Based on the previous section, this article analyzes the garbage collection mechanism of v8 engine in depth.

  • From the big aspect, it is divided into small recycling cycle and large recycling cycle
  • Small recycling cycle occurs in new space, with high frequency, short time and fast speed. Chenny algorithm is used
  • Large recycling cycles occur in old space, with low frequency and slow speed, using depth-first traversal and three-color labeling (black, white and gray)
  • The optimization methods are mainly incremental marking and delayed clearing, and the core idea is to fragment the marking stage and prioritize clearing space on demand

Well, about the content of the recycling, probably is here, in this paper, some relative to the previous article a little boring, and there is no drawing to illustrate the process (q is too lazy to draw -_ -), but the re-read it is well understood, and had removed about memory bitmap convenient things more at the bottom of the understanding of the core ideas, For those of you who want to delve deeper, see the detailed references at the back.

Convention: If there are mistakes in the content, welcome to point out them (feel that it is not comfortable to look at them and make fun of them); If it is helpful, welcome to like and collect, please get permission to reprint the source, if there is a problem also welcome private communication, the home page has email address

By the way, RingCentral has also set up an office in Hangzhou, and can apply for long-term telecommuting to bid farewell to 996 and balance work and life. If you are interested, you can send me a private letter or email and help me to promote for free

Refer to the article

Jayconrod.com/posts/55/a-...

Search
About
mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.