preface

Recently I read The Algorithm and Implementation of Garbage Collection, in which I talked about some commonly used Garbage collection algorithms, such as mark-sweep, reference counting, generational collection and so on. Python’s garbage collection strategy comes later, which is noted here.

Four factors that measure GC performance

  1. throughput

Throughput is the capacity of GC output per unit time. The calculation formula is: heap capacity for GC processing/time for GC. In general, people prefer GC algorithms with high throughput.

  1. Maximum pause time

The maximum amount of time the user thread is paused during GC execution. If the GC takes too long, it will affect the normal execution of the program. Larger throughput and shorter maximum pause times are often incompatible.

  1. Heap efficiency

GC is an automatic memory management feature, so it is ideal not to take up too much heap space while GC is going on. Two factors that affect heap efficiency are header size and heap usage. The larger the heap available, the faster the GC runs; Conversely, the more you want to make efficient use of the limited heap, the longer GC takes.

  1. Locality of access

Contiguous access between objects that have reference relationships is often possible. This is common in most programs and is called “locality of access.” Considering access locality, placing objects with reference relationships closer to each other in the heap improves data utilization.

Python uses an algorithm for reference counting GC, which has the advantage of reducing the maximum pause time, but has the disadvantage of being very challenging in maintaining count increments and decrement (forgetting to decrement results in unfreed objects).

Where does reference counting exist

Python data, such as List, Set, Tuple, Dict, Str, and Int, has a member of type PyObject in its underlying data structure that maintains the object’s reference count

typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeoject *ob_type;
} PyObject;
Copy the code

One of theob_refcntMembers are responsible for maintaining reference counts, so all of the built-in constructs are retained at the beginningPyObjectStructure to maintain reference counts.

Memory allocator

Python does not simply call malloc/free to ask the operating system for memory allocation. Instead, the memory allocator was split into three tiers for efficiency reasons, minimizing system calls.

Object-specific allocators _____ ______ ______ ________ [ int ] [ dict ] [ list ] ... [ string ] Python core | +3 | <----- Object-specific memory -----> | <-- Non-object memory --> | _______________________________ | | [ Python's object allocator ] | | +2 | ####### Object memory ####### | <------ Internal buffers ------> | ______________________________________________________________ | [ Python's raw memory allocator (PyMem_ API) ] | +1 | <----- Python memory (under PyMem manager's control) ------> | | __________________________________________________________________ [ Underlying general-purpose allocator (ex: C library malloc) ] 0 | <------ Virtual memory allocated for the python process -------> | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = _______________________________________________________________________ [ OS-specific Virtual Memory Manager (VMM) ] -1 | <--- Kernel dynamic storage allocation & management (page-based) ---> | __________________________________ __________________________________ [ ] [ ] -2 | <-- Physical memory: ROM/RAM --> | | <-- Secondary storage (swap) --> |Copy the code

Layers 0 through 3 are the distributer levels that Python implements to manage, and if we take dictionary objects for example,

PyDict_New()             ---- 3The layer PyObject_GC_New () -- -- --2 -The layer PyObject_Malloc () -- -- --2 -The layer new_arena () -- -- --- 1layermalloc() -- -- --0layerCopy the code

Layer 0 simply calls the allocation functions provided by the operating system, such as malloc. Python does not call layer 0 for all object generation, but selects different allocation schemes based on the amount of memory to allocate:

  • Requested memory greater than 256 bytes, use layer 0
  • The requested memory is less than 256 bytes, use layer 1 and layer 2

Based on experience, we use a large number of objects in the coding process that are less than 256 bytes and have a short life cycle, for example:

for x in range(100) :print(x)
Copy the code

This process is inefficient if malloc and free are frequently called. Python manages this by adding layers 1 and 2 and reserving some space from layer 0 beforehand in the first layer, which is used when the allocated object memory is less than 256.

The Python-managed memory structure has three parts:block,pool,arena, the relationship between them is as follows:

  • Arena is used to manage storage pools
  • Pool is used to manage storage blocks
  • Block The smallest unit of memory available to the applicant

To avoid frequent calls to malloc() and free(), the level 1 allocator reserves memory at the maximum unit arena. A pool is the unit used to effectively manage empty blocks.

The level 2 allocator manages blocks in the pool. Returns the beginning address of the block to the applicant, releases the block, etc. The allocator splits the pool into multiples of 8 bytes and produces several blocks, such as 8-byte blocks, 16-byte blocks, 24-byte blocks,… 256 byte block. When requesting memory, a block of the appropriate size is returned.

Layer 3 is the object – specific allocator. Python’s various built-in types, such as list, dict, etc., have their own allocator. Dict allocator looks like this:

#ifndef PyDict_MAXFREELIST
#define PyDict_MAXFREELIST 80
#endif
static PyDictObject *free_list[PyDict_MAXFREELIST];
static int numfree = 0;
Copy the code

When allocating dictionary objects, the free list free_list is checked to see if any objects are available. If so, PyObject_GC_New is used to apply for objects.

The overall Python interaction when generating dictionary objects is as follows:

Reference counting method

Reference counting is implemented in Python to respond to changes in the number of references to each object. If the number of references to an object increases, the counter is incremented by 1. If the number of references to an object decreases, the counter is subtracted by 1.

The increment operation

The actual counting operation is implemented by the macro Py_INCREF

#define Py_INCREF(op) (                  \
    ((PyObject*)(op))->ob_refcnt++)
Copy the code

Ob_refcnt is of type int in 32-bit environments and long in 64-bit environments. Because there are sign bits, only half of the values can be represented by non-negative integers. But because Pointers are basically 4-byte aligned, even if the reference counter is referenced by all Pointers, it will not overflow. It is designed to allow negative numbers for easy debugging. When a reference counter has a negative number, there is the possibility of excessive decrement or incremental operations left over.

Decrement operation

The actual counting is implemented by the macro Py_DECREF

#define Py_DECREF(op)                          \
    if(--((PyObject*)(op))->ob_refcnt ! = 0) \ _Py_CHECK_REFCNT(op) \else                                \
        _Py_Dealloc((PyObject*)(op))


# define _Py_Dealloc(op) (              \
        (*Py_TYPE(op)->tp_dealloc)((PyObject*)(op)))
Copy the code

If the counter is not 0, the macro _Py_CHECK_REFCNT is called to check whether the reference counter has turned negative. If the counter is 0, the macro _Py_Dealloc is called. Py_TYPE identifies the type of the object and calls the corresponding function pointer that frees each object. For example, tupleDealloc is the function pointer that frees tuples.

static void
tupledealloc(register PyTupleObject *op)
{
    register Py_ssize_t i;
    register Py_ssize_t len = Py_Size(op);
    
    if (len > 0) {
        i = len;
        /* Decrement the elements in a tuple */
        while(--i >= 0)
            Py_XDECREF(op->ob_item[i]);
    }
    /* Release the tuple object */
    Py_TYPE(op)->tp_free((PyObject *)op);
    
    Py_TRASHCAN_SAFE_END(OP);
}
Copy the code

The member tp_free holds Pointers to the release handlers for each object, mostly calling PyObect_GC_Del internally

void
PyObject_GC_Del(void *op)
{
    PyGC_Head *g = AS_GC(op);
    / *... * /
    Pyobject_FREE(g)
}
Copy the code

The decrement operation call diagram of a tuple is as follows:

Py_DECREF _Py_Dealloc ———— decrement operation tupleDealloc PyObject_GC_Del ———— Tuple release process PyObject_FREE PyObject_FREE ———— releases memoryCopy the code

Ownership of references

As illustrated above, the core operations of reference counting are counting +1 and counting -1. To be clear, who is responsible for the operation, for example: object A obtains object B by calling func1, so who is responsible for the reference count +1 of object B? This is where “reference ownership” comes in, meaning that whoever owns the reference is responsible for decrement the reference counter of an object when the reference is no longer needed.

  1. Passing reference ownership (return value)

That is, the function hand over the ownership of the reference to the caller, along with the return value. It is up to the caller of the function to perform the decrement at the end of the reference. Reference ownership is passed to the caller when an object is generated, such as dictionary object generation

PyObject *dict = PyDict_New();
/* Perform some operations, then */
Py_DECREF(dict);
dict = NULL;
Copy the code
  1. Ownership of lent references (return value)

That is, the function only gives the return value to the caller, but the ownership of the reference is only lent, and the caller cannot decrement the reference count. Functions that fetch elements from collections lend ownership, such as tuples that fetch specified indexes

PyObject *
PyTuple_GetItem(register PyObject *op, register Py_ssize_t i)
{
    /* Check the operation */
    return ((PyTupleObject *)op) -> ob_item[i]
}
Copy the code
  1. Take ownership of a reference (parameter)

That is, when a caller passes an argument to a function, the function sometimes takes ownership of the reference of the argument, and the function is responsible for the reference management of the argument. Any function that adds an element to a list, such as a function that adds an element to a tuple, takes reference ownership of the argument

PyObject *tuple, *dict;

tuple = PyTuple_New(3);
dict = PyDict_New();    /* Returns dict*/ with ownership
PyTuple_SetItem(tuple, 0, dict);    /* The ownership of the reference is taken by the tuple
dict = NULL;
Copy the code
  1. Lending reference ownership (parameter)

That is, the caller lends reference ownership of the argument to the function. When a function caller wants to lend ownership of the reference, the caller must retain ownership of the reference to the object from the time the object is handed to the function until the end of the function execution.

Cyclic reference garbage collection

Reference counting has a defect that does not solve the problem of circular references. When two objects refer to each other, the reference count cannot be cleared, that is, GC cannot be performed. So for circular references, Python works with a partial mark-erase algorithm.

Algorithm principle: partial mark – clear algorithm

For example, in the figure below, the three objects on the left had circular references before, preventing them from being reclaimed by normal reference countsLet’s start by copying their current reference count to another section for later useThere is one premise: Python objects connect themselves to one when they are generatedObject list(connected by bidirectional pointing), represented by bidirectional arrowsBased on that, let’s iterateObject listFor all objects in, subtract one from the copy count for all objects they referenceFor nowtagWe mark all objects whose copy count is still greater than 0 after the previous step as “reachable objects”, that is, other active objects reference them, and mark the objects referenced by them as reachable objects, and connect them to the reachable object list. The object with a copy count of 0 is then labeled “unreachable” and connected to the list of unreachable objects.As you can see, the unreachable object is the object referenced by the loopremoveOperation, we will free the memory of the objects in the unreachable object list, rejoin the objects in the reachable object listObject listIn theAt this point, we are done garbage collecting the circular reference object.

Not all objects have circular references

The root cause of circular references is that some objects may retain references to other objects, for which we call “container objects”. Tuples, lists, dictionaries, and so on are container objects, and you only need to solve the problem of circular references for these container objects. Each container object is assigned a header structure for cyclic reference garbage collection.

tyepdef union _gc_head {
    struct {
        union _ge_head *gc_next; /* for bidirectional linked lists */
        union _gc_head *gc_prev; /* for bidirectional linked lists */
        Py_ssize_t gc_refs;      /* For copy count */
    } gc;
    long double dummy;
} PyGC_Head;
Copy the code

As mentioned earlier, when a container object is generated, it is connected to a linked list of objects. In the case of a dictionary object, take a look at the generated code

PyObject *
PyDict_New(void)
{
    regiseter PyDictObject *mp;
    /* The operation to generate the object */
    _PyObject_GC_TRACK(mp);
    return (PyObject *)mp;

}
Copy the code

_PyObject_GC_TRACK is responsible for connecting operations to a linked list of objects

#define _PyObject_GC_TRACK(o) do { \
    PyGC_Head *g = _Py_AS_GC(o); \
    g->gc.gc_refs = _PyGC_REFS_REACHABLE;  \
    g->gc.gc_next = _PyGC_generation0; \
    g->gc.gc_prev = _PyGC_generation0->gc.gc_prev; \
    g->gc.gc_prev ->gc.gc_next = g;  \
    _PyGC_generation0->gc.gc_prev = g; \
    } while (0);
Copy the code

Container objects are managed by generation

Python divides the list of container objects mentioned above into “generation 0”, “generation 1”, and “generation 2”, which are managed by the following structure

struct gc_generation {
    PyGC_Head head;   /* head pointing */
    int threshold;    /* The threshold to start GC */
    int count;        /* The number of objects changed */
}

# define GEN_HEAD(n) (&generations[n].head)

PyGC_Head *_PyGC_generation0 = GEN_HEAD(0); /* Generation 0 container object */
Copy the code

At the beginning all container objects are connected to generation 0 objects. When a generation 0 container object passes through a cyclic reference garbage collection, the surviving objects are promoted to generation 1. Objects that survive cyclic reference garbage collection are promoted to generation 2 again.

Remove the exception

During cyclic reference garbage collection, objects with finalizers are removed from the unreachable list

static void
move_finalizers(PyGC_Head *unreachable, PyGC_Head *finalizers)
{
    PyGC_Head *gc;
    PyGC_Head *next;
    
    for(gc = unreachable->gc.gc_next; gc ! = unreachable; gc = next) { PyObject *op = FROM_GC(gc); next = gc->gc.gc_next;if(has_finalizer(op)) { gc_list_move(gc, finalizers); gc->gc.gc_refs = GC_REACHABLE; }}}Copy the code

This is because it is cumbersome to release objects with finalizers that are referenced in a loop. We are not sure which object is reasonable to release first. What if the finalizer executed when we release the first object and then the second object references the first object?

Python provides the global variable garbage to allow developers to remove cyclic references from objects that contain finalizers.

import gc
gc.grabage
Copy the code

conclusion

Python’s GC is divided into two parts:

  • throughReference counting algorithmTo manage the garbage collection of regular objects and minimize GC through optimized memory structures
  • throughGeneration + clear - markTo manage the garbage collection of circular reference objects