Read a lot of GC algorithm theory, is it still feel almost what? Then follow the ideas in this article, write your own “young generation recycling algorithm” – copy algorithm to try it

Copying GC is an algorithm studied by Marvin L. Minsky in 1963. To put it simply, you copy live objects in one space to other Spaces, and recycle all objects in the original space. This is a pretty bold algorithm. Here, we call the original space where we copy live objects From space and the new space where we paste live objects To space.

This article mainly refers to “Garbage collection algorithm and Implementation”, using C language to achieve; At the end of the article with the source code address, complete can run

Noun explanation

object

Objects, in the GC world, represent collections of data and are the basic unit of garbage collection.

Pointer to the

It can be interpreted as a pointer (or handle) in C, and GC searches for objects based on a pointer.

mutator

This word is translated in some places as an assignment, but it’s still a bit weird, so it’s better not to be translated…

Mutator is a word coined by Edsger Dijkstra, meaning to change something. When it comes to changing something, it’s the reference relationships between GC objects. But just saying that may not be understood, in fact, in a word, its entity is “application”.

Mutator works in two ways:

  • To generate the object
  • Update the pointer

Mutator does all this while doing some processing (numerical calculations, web browsing, editing articles, and so on) for the user of the application. As these processes progress, the reference relationships between objects “change.” Garbage is generated along with these changes, and the mechanism responsible for collecting this garbage is the GC.

GC ROOTS

GC ROOTS is the starting point for references, such as stacks, global variables

Heap (Heap)

A heap is a chunk of dynamic memory in a process. In the GC world, a large chunk of heap memory is allocated and then a mutatar is allocated

Active and inactive objects

Live objects are objects that can be referenced by mutatar (GC ROOTS) and non-live objects that cannot be accessed.

The preparatory work

In the replication algorithm, sequential allocation is used. The sequential allocation flow is shown in the following figure

Maintains a free pointer that moves after each memory allocation. Limit-free is the size of the memory available in the heap

Data structure design

The first is the structure of the object type:

To access the properties of an “object” dynamically, the property offset is used here to record the position of the property, and then the property is obtained through the evaluation of the pointer

typedef struct class_descriptor {
    char *name;/ / class name
    int size;// Class size, i.e. Sizeof (struct)
    int num_fields;// Number of attributes
    int *field_offsets;// Attribute offset in a class, i.e. the offset of all attributes in the struct
} class_descriptor;
Copy the code

Then there is the structure of the object, which, although there is no concept of inheritance in C, can be implemented through structs with common attributes:

typedef struct _object {
    class_descriptor *class;// The type of the object
    byte forwarded;// The mark of the object that has been moved to prevent duplication
    object *forwarding;// Target position
} object;

/ / inheritance
// The "inherited object" must have the same basic attributes as the parent object. After the basic attributes, you can define other attributes
typedef struct emp {
    class_descriptor *class;// The type of the object
    byte forwarded;// The token that the object has moved
    object *forwarding;// Target position
    int id;
    dept *dept;
} emp;
Copy the code

With the basic data structure in place, it’s time to implement the algorithm

Algorithm implementation

The replication algorithm uses the From space for allocation. When the From space is completely occupied and cannot be allocated, the GC copies all live objects into the To space. When the copy is complete, the From/To space is swapped in preparation for the next GC. In this algorithm, in order To ensure that the To space can hold all the active objects in the From space, the From and To space need To have the same capacity.

The flow of the replication algorithm is shown in the figure below:

Initialize the heap

In the replication algorithm, you need to split the heap in half, with half as from and half as to

void gc_init(int size) {
    heap_size = resolve_heap_size(size);
    heap_half_size = heap_size / 2;
    heap = (void *) malloc(heap_size);
    from = heap;
    to = (void *) (heap_half_size + from);
    _rp = 0;
}
Copy the code

Create object & memory allocation

To allocate memory for a newly created object, simply move the Free pointer

Next_free_offset is the free pointer in the figure

object *gc_alloc(class_descriptor *class) {

    // Check if it can be allocated
    if (next_free_offset + class->size > heap_half_size) {
        printf("Allocation Failed. execute gc... \n");
        gc();
        if (next_free_offset + class->size > heap_half_size) {
            printf("Allocation Failed! OutOfMemory... \n");
            abort();
        }
    }

    int old_offset = next_free_offset;

    // After allocation, free moves to the next assignable location
    next_free_offset = next_free_offset + class->size;

    / / distribution
    object *new_obj = (object *) (old_offset + heap);

    / / initialization
    new_obj->class = class;
    new_obj->forwarded = FALSE;
    new_obj->forwarding = NULL;

    for (int i = 0; i < new_obj->class->num_fields; ++i) {
        //*(data **) is a dereference operation that gets a field pointer
        //(void *)o is strongly converted to void* pointer. Void * does not add addresses by type
        *(object **) ((void *) new_obj + new_obj->class->field_offsets[i]) = NULL;
    }

    return new_obj;
}
Copy the code

copy

When copying, it is necessary to traverse the object graph from GC ROOTS and copy each surviving object. The address of the object is changed after copying, and the address referenced by GC ROOTS needs to be updated.

void copying(a) {
    next_forwarding_offset = 0;
    // Walk through GC ROOTS
    for (int i = 0; i < _rp; ++i) {
        object *forwarded = copy(_roots[i]);

        // First update the object referenced by GC ROOTS to the new object in the to space
        _roots[i] = forwarded;
    }

    // Update the reference
    adjust_ref();

    // Empty from and swap from/to
    swap(&from,&to);
}
Copy the code

The replication algorithm flow is as follows:

The copy method:

object *copy(object *obj) {

    if(! obj) {return NULL; }

    // The same object may be referenced by multiple objects, so it is judged here to avoid duplicate copying
    if(! obj->forwarded) {// Calculate the copied pointer
        object *forwarding = (object *) (next_forwarding_offset + to);

        / / assignment
        memcpy(forwarding, obj, obj->class->size);

        obj->forwarded = TRUE;

        // Write the copied pointer to the forwarding pointer of the original object in preparation for the final update of the reference
        obj->forwarding = forwarding;

        // After copying, move to forwarding offset
        next_forwarding_offset += obj->class->size;

        // Copy reference objects recursively. Recursion is depth-first
        for (int i = 0; i < obj->class->num_fields; i++) {
            copy(*(object **) ((void *) obj + obj->class->field_offsets[i]));
        }
        return forwarding;
    }

    return obj->forwarding;
}

Copy the code

Forwarding pointer

Personally, I think “Forwarding Pointer” is an important concept in replication algorithms

Forward pointer, refers to when the copy, in the original object to retain the pointer to the new object. Why keep this pointer?

Because it is not only the object that needs to be copied, but also the reference relationship of the object. For example, the object ACD needs to be copied, and only object A is copied, but the CD referenced by the copied object A’ (A prime) is not copied

Adjust the reference

After all active objects are copied, you need to adjust the referenced address to the copied object address. Simply traverse one to space and find the Forwarding pointer update for the referenced object

void adjust_ref(a) {
    int p = 0;
    // Iterate over to, which is the target space for replication
    while (p < next_forwarding_offset) {
        object *obj = (object *) (p + to);
        // Update a reference that also points to from to a forwarding pointer
        for (int i = 0; i < obj->class->num_fields; i++) {
            object **field = (object **) ((void *) obj + obj->class->field_offsets[i]);
            if((*field) && (*field)->forwarding) { *field = (*field)->forwarding; }}// Access the next object sequentiallyp = p + obj->class->size; }}Copy the code

So that’s the replication algorithm

advantages

  • High throughput, no need to traverse the whole heap, only need to process live objects
  • The allocation speed is fast. Compared with free-list allocation, sequential allocation does not need to search free-list, but only need to move free pointer
  • Fragmentation is not a problem because each copy copies the living object from from to one end of to

disadvantages

Heap utilization is low because only half of the memory is used to store objects under the replication algorithm.

Write in the back

The replication algorithm introduced in this paper is the most basic replication algorithm. Things like the replication algorithm in the JVM are a later and improved version, which will be covered in more detail when we introduce generational collection algorithms

The complete code

Github.com/kongwu-/gc_…

reference

  • Algorithms and Implementation of Garbage Recycling by Narayo Nakamura, Mitsu Aikawa, Iyuo Takeuchi
  • Handbook of Garbage Collection Algorithms the Art of Automatic Memory Management. By Richard Jones. Translated by Yaguang Wang

Original is not easy to reprint, please at the beginning of the famous article source and author. If my article helped you, please like/bookmark/follow it ❤❤❤❤❤❤