The magic usage and principle of weak references in Python

Declaration: the work content needs to deal with citation relations, and looked up this literature, record to share! The content of the original source: https://mp.weixin.qq.com/s/J7A7ehTowHxVlbUll0Iv4A if any infringement, contact will be deleted!!!!!!Copy the code

background

Before we start talking about weakref, let’s look at what is a weak reference? What does it do?

Suppose we have a multithreaded program that processes application data concurrently:

It takes up a lot of resources and costs a lot to create and destroy
class Data:\
    def __init__(self, key) :\
        pass
Copy the code

Application Data is uniquely identified by a key. The same Data can be accessed by multiple threads at the same time. Because Data requires a lot of system resources, it is expensive to create and consume. We want to maintain only one copy of Data in our program, even if it is accessed by multiple threads at the same time.

To this end, we try to design a cache middleware Cacher:

import threading
# Data cache
class Cacher:
    def __init__(self) :
        self.pool = {}
        self.lock = threading.Lock()
    def get(self, key) :
        with self.lock:
            data = self.pool.get(key)
            if data:
                return data
            self.pool[key] = data = Data(key)
            return data
Copy the code

Cacher internally caches created copies of Data with a dict object and provides a GET method to get application Data. The get method looks up the cache dictionary before obtaining the data. If the data already exists, it returns it directly. If the data does not exist, create one and save it to the dictionary. Therefore, data is first created and entered into the cache dictionary, and subsequent accesses by other threads use the same copy in the cache.

It feels great! But there is a fly in the ointment: Cacher is at risk of resource leakage!

Because once created, Data is stored in the cache dictionary and never released! In other words, a program’s resources, such as memory, will continue to grow, and may eventually explode. Therefore, we want a piece of data to be released automatically when all threads are no longer accessing it.

We can maintain the number of references to the data in Cacher, and the GET method automatically accumulates this count. At the same time, a new remove method is provided to free the data, which first decrement the number of references and removes the data from the cache field when the number of references drops to zero.

The thread calls the GET method to get the data, and then calls the remove method to release the data when it runs out. Cacher has implemented reference counting itself, which is too much trouble. Doesn’t Python have garbage collection built in? Why does the application need to implement itself?

The main sticking point for the conflict is Cacher’s cache dictionary: as a middleware, it does not use data objects itself, so theoretically there should be no references to data. Is there any hacking technology that can find a target without generating a reference? As we know, assignments always produce references!

Typical usage

At this time, weakref came on stage! A weak reference is a special object that can be associated with a target object without producing a reference.

Create data
>>> d = Data('fasionchan.com')
>>> d
<__main__.Data object at 0x1018571f0>

Create a weak reference to the data
>>> import weakref
>>> r = weakref.ref(d)

We can find the object to which we refer
>>> r()
<__main__.Data object at 0x1018571f0>
>>> r() is d
True

Delete temporary variable d, there are no other references to the Data object, it will be recycled
>>> del d
The target Data object is no longer available (return None)
>>> r()
Copy the code

In this case, we could simply change the Cacher cache dictionary to hold weak references and the problem would be solved!

import threading
import weakref
# Data cache
class Cacher:
    def __init__(self) :
        self.pool = {}
        self.lock = threading.Lock()
    def get(self, key) :
        with self.lock:
            r = self.pool.get(key)
            if r:
                data = r()
                if data:
                    return data
            data = Data(key)
            self.pool[key] = weakref.ref(data)
            return data
Copy the code

Because the cache dictionary only holds weak references to Data objects, Cacher does not affect the reference count of Data objects. When all threads run out of data, the reference count drops to zero and is released.

In fact, it is common to cache data objects with dictionaries. For this purpose, the WeakRef module also provides two dictionary objects that store only weak references:

weakref.WeakKeyDictionaryKey stores only weakly referenced mapping classes (key-value pair entries disappear automatically once the key no longer has strong references);
weakref.WeakValueDictionary, values only hold weakly referenced mapping classes (key-value pair entries disappear automatically once values no longer have strong references);

The data cache dictionary of so we can use weakref. WeakValueDictionary, the interface with ordinary dictionary are exactly the same. This way we don’t need to maintain weak reference objects ourselves, and the code logic is more concise:

import threading
import weakref
# Data cache
class Cacher:
    def __init__(self) :
        self.pool = weakref.WeakValueDictionary()
        self.lock = threading.Lock()
    def get(self, key) :
        with self.lock:
            data = self.pool.get(key)
            if data:
                return data
            self.pool[key] = data = Data(key)
            return data
Copy the code

Weakref module also has many useful tool classes and tool functions. For details, please refer to the official documents, which will not be described here.

The working principle of

So who is a weak quote and why is it so magical? Next, let’s take a look at it!

>>> d = Data('fasionchan.com')

# weakref.ref is an object of built-in type
>>> from weakref import ref
>>> ref
<class 'weakref'>

Weakref. ref creates a weakreference instance object
>>> r = ref(d)
>>> r
<weakref at 0x1008d5b80; to 'Data' at 0x100873d60>
Copy the code

After the previous chapter, we are familiar with reading the source code of the built-in object. The relevant source file is as follows:

Include/weakrefobject.hThe header file contains the object structure and some macro definitions;
Objects/weakrefobject.cThe source file contains weakly referenced objects and their method definitions;

We first pick a steak a weak reference object field structure, defined in Include/weakrefobject. H 10-41 in the header file:

typedef struct _PyWeakReference PyWeakReference;

/* PyWeakReference is the base struct for the Python ReferenceType, ProxyType, * and CallableProxyType. */
#ifndef Py_LIMITED_API
struct _PyWeakReference {
    PyObject_HEAD

    /* The object to which this is a weak reference, or Py_None if none. * Note that this is a stealth reference: wr_object's refcount is * not incremented to reflect this pointer. */
    PyObject *wr_object;

    /* A callable to invoke when wr_object dies, or NULL if none. */
    PyObject *wr_callback;

    /* A cache for wr_object's hash code. As usual for hashes, this is -1 * if the hash code isn't known yet. */
    Py_hash_t hash;

    /* If wr_object is weakly referenced, wr_object has a doubly-linked NULL- * terminated list of weak references to it. These are the list pointers. * If wr_object goes away, wr_object is set to Py_None, and these pointers * have no meaning then. */
    PyWeakReference *wr_prev;
    PyWeakReference *wr_next;
};
#endif
Copy the code

It can be seen that PyWeakReference structure is the body of weak reference object. It is a fixed-length object with five fields in addition to the fixed header:

wr_objectWeak reference This field can be used to find the referenced object but does not generate a reference.
wr_callback, refers to a callable object that will be called when the referenced object is destroyed;
hash, caches the hash value of the referenced object;
wr_prev 和 wr_nextAre forward and backward Pointers, which are used to organize weak reference objects into bidirectional linked lists.

Combined with the comments in the code, we know:

Weak reference objects passwr_objectFields are associated with referenced objects, as shown by dotted arrows in the figure above;
An object can be associated with multiple weak reference objects at the same time, as shown in the figureDataInstance objects are associated with two weak reference objects;
All weak references that are associated with the same object are organized into a bidirectional linked list, the head of which is stored in the referenced object, as shown by the solid arrow in the figure above.
When an object is destroyed, Python iterates through its list of weak references, one by one:
- Setting the wr_object field to None returns None when a weak reference object is called again, and the caller knows that the object has been destroyed.
- Execute the callback functionwr_callback(if any)

Thus, weak references work as the Observer pattern in design patterns. When an object is destroyed, all its weak reference objects are notified and handled properly.

Implementation details

Understanding the fundamentals of weak references is enough to make them useful. If you’re interested in the source code, you can delve into some of the implementation details.

As we mentioned earlier, all weak references to the same object are organized into a bidirectional linked list, with the head stored in the object. Because the types of objects that can create weak references are so diverse, they are difficult to represent by a fixed structure. Therefore, Python provides a field tp_Weaklistoffset in the type object that records the offset of the weakly referenced linked list header pointer in the instance object.

Therefore, for any object O, we only need to find its type object T through the ob_type field, and then according to the TP_Weaklistoffset field in T, we can find the weak reference chain header of object O.

Python provides two macro definitions in the Include/ objimp.h header:

/* Test if a type supports weak references */
#define PyType_SUPPORTS_WEAKREFS(t) ((t)->tp_weaklistoffset > 0)

#define PyObject_GET_WEAKREFS_LISTPTR(o) \
    ((PyObject **) (((char *) (o)) + Py_TYPE(o)->tp_weaklistoffset))
Copy the code

PyType_SUPPORTS_WEAKREFS is used to determine whether type objects support weak references. Weak references are supported only when TP_WeakListoffset is greater than zero. Built-in objects such as list do not support weak references.
PyObject_GET_WEAKREFS_LISTPTR is used to fetch the weak reference header of an object. It first finds the type object T through the Py_TYPE macro, and then determines the offset by the TP_WeakListOffset field. Finally, the address of the header field can be obtained by adding the address of the object.

When we create a weak reference, we need to call weakRef and pass in the referenced object D as a parameter. Weakref_type object WeakRef is the type of all weak-reference instance Objects and is a globally unique type object defined in Objects/ WeakRefObject.c, i.e. : _PyWeakref_RefType (line 350).

Based on what you’ve learned from the object model, when Python calls an object, it executes the tp_call function on an object of its type. Therefore, when weak-reference type object WeakRef is called, the type object of WeakRef is executed, that is, the TP_call function of type. The tp_call function goes back to calling WeakRef’s tp_new and tp_init functions, where TP_new allocates memory for the instance object and tp_init is responsible for initializing the instance object.

Back in the Objects/ WeakRefObject.c source file, you can see that the TP_new field of PyWeakref_RefType is initialized to * WeakRef___new_ * (line 276). The main processing logic of this function is as follows:

Parses the parameters to get the referenced object (line 282);
callPyType_SUPPORTS_WEAKREFSThe macro determines whether the referenced object supports weak references and throws an exception if it does not (line 286);
callGET_WEAKREFS_LISTPTRThe line fetches the weakly referenced linked list header field of the object and returns a second-level pointer for ease of insertion (line 294);
Call get_basic_refs to retrieve the null base weak reference for the first callback (line 295, if any);
ifcallbackIs empty and the object existscallbackIs an empty underlying weak reference, then reuse the instance and return it directly (line 296);
If it cannot be reused, the tp_alloc function is called to allocate memory, complete field initialization, and insert the object into the weak reference list (line 309);
- If callback is empty, insert it directly to the front of the list for reuse later (see point 4).
- If the callback is not empty, insert it after the underlying weak reference object (if any) to ensure that the underlying weak reference is at the head of the list for easy retrieval.

When an object is reclaimed, the tp_dealloc function calls the PyObject_ClearWeakRefs function to clean up its weak references. This function fetches a list of weak references to the objects and iterates through them one by one, cleaning up the WR_object fields and executing the WR_callback callback function (if any). The specific details are no longer expanded. If you are interested, you can refer to the source code in Objects/ WeakRefObject.c, which is located in line 880.

Ok, after this section, we have thoroughly mastered the knowledge related to weak references. Weak references manage target objects without reference counting and are commonly used in frameworks and middleware. Weak references look like magic, but the design principle is a very simple observer pattern. Once the weak-reference object is created, it is inserted into a linked list maintained by the target object to observe (subscribe to) the object’s destruction events.

The magic usage and principle of weak references in Python

background

Typical usage

The working principle of

Implementation details

Related Posts

Your application Android Q Labs | modernization

Chatter – Be Aware of these Moments at work (25)

Should you choose to be a programmer when you realize you can’t become one of the top 1%?