This post first appeared on AT7H’s personal blog

In this article, we will discuss some of the concepts and implementation principles behind classes and objects in Python 3.8. We will try to explain the storage of Python class and object attributes, functions and methods, descriptors, optimized support for object memory usage, inheritance and attribute lookup.

Let’s start with a simple example:

class Employee:

    outsource = False

    def __init__(self, department, name) :
        self.department = department
        self.name = name

    @property
    def inservice(self) :
        return self.department is not None

    def __repr__(self) :
        return f"<Employee: {self.department}-{self.name}>"

employee = Employee('IT'.'bobo')
Copy the code

The Employee object is an instance of the Employee class with two attributes, Department and Name, whose values belong to the instance. Outsource is a class attribute, owned by the class, and shared by all instance objects of that class, as in other object-oriented languages.

Changing a class variable affects all instance objects of the class:

>>> e1 = Employee('IT'.'bobo')
>>> e2 = Employee('HR'.'cici')
>>> e1.outsource, e2.outsource
(False.False)
>>> Employee.outsource = True
>>> e1.outsource, e2.outsource
>>> (True.True)
Copy the code

This is limited to changing from a class when we change a class variable from an instance:

>>> e1 = Employee('IT'.'bobo')
>>> e2 = Employee('HR'.'cici')
>>> e1.outsource, e2.outsource
(False.False)
>>> e1.outsource = True
>>> e1.outsource, e2.outsource
(True.False)
Copy the code

Yes, when you try to modify a class variable from an instance object, Python does not change the class variable value of the class. Instead, it creates an instance attribute with the same name, which is quite correct and safe. Instance variables take precedence over class variables when searching for attribute values, as explained in more detail in the inheritance and attribute Lookup section.

It is worth noting in particular that when class variables are of mutable type, you change them from the instance object:

>>> class S:
.    L = [1.2]...>>> s1, s2 = S(), S()
>>> s1.L, s2.L
([1.2], [1.2])
>>> t1.L.append(3)
>>> t1.L, s2.L
([1.2.3], [1.2.3])
Copy the code

Good practice is to avoid such designs as much as possible.

Storage of attributes

This section takes a look at how class attributes, methods, and instance attributes are stored together in Python.

Instance attributes

In Python, all instance attributes are stored in the __dict__ dictionary, a regular dict from which instance attributes are retrieved and modified, and which is completely open to developers.

>>> e = Employee('IT'.'bobo')
>>> e.__dict__
{'department': 'IT'.'name': 'bobo'}
>>> type(e.__dict__)
dict
>>> e.name is e.__dict__['name']
True
>>> e.__dict__['department'] = 'HR'
>>> e.department
'HR'
Copy the code

Since instance attributes are stored in dictionaries, we can easily add or remove fields to objects at any time:

>>> e.age = 30 # does not define the age attribute
>>> e.age
30
>>> e.__dict__
{'department': 'IT'.'name': 'bobo'.'age': 30}
>>> del e.age
>>> e.__dict__
{'department': 'IT'.'name': 'd'}
Copy the code

We can also instantiate an object from a dictionary, or restore the instance by saving its __dict__.

>>> def new_employee_from(d) :
.    instance = object.__new__(Employee)
.    instance.__dict__.update(d)
.    return instance
...
>>> e1 = new_employee_from({'department': 'IT'.'name': 'bobo'})
>>> e1
<Employee: IT-bobo>
>>> state = e1.__dict__.copy()
>>> del e1
>>> e2 = new_employee_from(state)
>>> e2
>>> <Employee: IT-bobo>
Copy the code

Because __dict__ is completely open, we can add any IMmutable key to it, such as numbers:

>>> e.__dict__[1] = 1
>>> e.__dict__
{'department': 'IT'.'name': 'bobo'.1: 1}
Copy the code

These non-string fields are not accessible from instance objects, and to ensure that this does not happen, it is generally best not to write directly to __dict__, or even directly to __dict__, unless necessary.

So there is a saying that Python is a “consenting adults language”.

This dynamic implementation makes our code very flexible and convenient in many cases, but it also incurs storage and performance costs. So Python also provides another mechanism (__slots__) to dispense with __dict__ to save memory and improve performance, as described in the __slots__ section.

Class attribute

Similarly, class attributes are stored in the class’s __dict__ dictionary:

>>> Employee.__dict__
mappingproxy({'__module__': '__main__'.'outsource': True.'__init__': <function __main__.Employee.__init__(self, department, name)>,
              'inservice': <property at 0x108419ea0>,
              '__repr__': <function __main__.Employee.__repr__(self)>,
              '__str__': <function __main__.Employee.__str__(self)>,
              '__dict__': <attribute '__dict__' of 'Employee' objects>,
              '__weakref__': <attribute '__weakref__' of 'Employee' objects>,
              '__doc__': None}

>>> type(Employee.__dict__)
mappingproxy
Copy the code

Unlike the “open” of the instance dictionary, the dictionary used by the class attribute is an MappingProxyType object, which is a non-setattr dictionary. This means that it is read-only to the developer, precisely to ensure that the keys of class attributes are strings to simplify and speed up the lookup of new class attributes and __mro__ search logic.

>>> Employee.__dict__['outsource'] = False
TypeError: 'mappingproxy' object does not support item assignment
Copy the code

Because all methods belong to a class, they are also stored in the class dictionary, as you can see from the example above for the existing __init__ and __repr__ methods. We can add a few more to verify:

class Employee:
    #...
    @staticmethod
    def soo() :
        pass

    @classmethod
    def coo(cls) :
        pass

    def foo(self) :
        pass
Copy the code
>>> Employee.__dict__
mappingproxy({'__module__': '__main__'.'outsource': False.'__init__': <function __main__.Employee.__init__(self, department, name)>,
              '__repr__': <function __main__.Employee.__repr__(self)>,
              'inservice': <property at 0x108419ea0>,
              'soo': <staticmethod at 0x1066ce588>,
              'coo': <classmethod at 0x1066ce828>,
              'foo': <function __main__.Employee.foo(self)>,
              '__dict__': <attribute '__dict__' of 'Employee' objects>,
              '__weakref__': <attribute '__weakref__' of 'Employee' objects>,
              '__doc__': None})
Copy the code

Inheritance and attribute lookup

Now that we know that all attributes and methods are stored in two __dict__ dictionaries, let’s look at how Python does attribute look-up.

In Python 3, all classes implicitly inherit from Object, so there is always an inheritance relationship, and Python supports multiple inheritance:

>>> class A:
.    pass.>>> class B:
.    pass.>>> class C(B) :
.    pass.>>> class D(A, C) :
.    pass.>>> D.mro()
[<class '__main__.D'>, <class '__main__.A'>, <class '__main__.C'>, <class '__main__.B'>, <class 'object'>]
Copy the code

Mro () is a special method that returns the linear parsing order of the class.

The default behavior for property access is to get, set, or delete the property from the object’s dictionary. For example, the simple description of finding e.f is:

__dict__[‘f’], then type(e).__dict__[‘f’], and then the base classes of type(e) (__mro__ order, not including metaclasses). If the value found is an object that defines a descriptor method, Python may override the default behavior and invoke the descriptor method instead. Where in the priority chain this occurs depends on the descriptor method defined and how it is invoked.

So, to understand the order of lookups, you must first understand the descriptor protocol.

To summarize, there are two types of descriptors: data descriptors and non-data descriptors.

If an object defines __set__() or __delete__() in addition to __get__(), it is treated as a data descriptor. Descriptors that define only __get__() are called non-data descriptors (they are commonly used for methods, but can have other uses as well)

Since functions only implement __get__, they are non-data descriptors.

Python finds object attributes in the following order:

  1. Data descriptor for class and superclass dictionaries
  2. The instance dictionary
  3. Non-data descriptors in class and superclass dictionaries

Remember, no matter how many levels of inheritance your class has, the dictionary of instance objects of that class will always store all instance variables, which is part of the point of super.

Here we try to describe the search order in pseudocode:

def get_attribute(obj, name) :
    class_definition = obj.__class__

    descriptor = None
    for cls in class_definition.mro():
        if name in cls.__dict__:
            descriptor = cls.__dict__[name]
            break

    if hasattr(descriptor, '__set__') :return descriptor, 'data descriptor'

    if name in obj.__dict__:
        return obj.__dict__[name], 'instance attribute'

    if descriptor is not None:
        return descriptor, 'non-data descriptor'
    else:
        raise AttributeError
Copy the code
>>> e = Employee('IT'.'bobo')
>>> get_attribute(e, 'outsource')
(False.'non-data descriptor')
>>> e.outsource = True
>>> get_attribute(e, 'outsource')
(True.'instance attribute')
>>> get_attribute(e, 'name')
('bobo'.'instance attribute')
>>> get_attribute(e, 'inservice')
(<property at 0x10c966d10>, 'data descriptor')
>>> get_attribute(e, 'foo')
(<function __main__.Employee.foo(self)>, 'non-data descriptor')
Copy the code

Because of this priority order, instances cannot override class data descriptor attributes, such as the property attribute:

>>> class Manager(Employee) :
.    def __init__(self, *arg) :
.        self.inservice = True
.        super().__init__(*arg)
...
>>> m = Manager("HR"."cici")
AttributeError: can't set attribute
Copy the code

Initiate the descriptor call

As mentioned above, when looking for a property, Python may override the default behavior and issue a descriptor method call instead if the value found is an object that defines a descriptor method.

The function of the descriptor is to bind object properties. We assume that a is an object implementing the descriptor protocol, and there are the following situations when e.a initiates the descriptor call:

  • Direct invocation: Direct invocation of user-level codee.__get__(a), not commonly used
  • Instance binding: Binding to an instance,e.aIs converted to a call:type(e).__dict__['a'].__get__(e, type(e))
  • Class binding: Bind to a class,E.aIs converted to a call:E.__dict__['a'].__get__(None, E)

When binding in an inheritance relationship, the chain calls are made based on the above and the __mro__ order.

Functions and Methods

We know that methods are class-specific functions. The only difference (if any) is that the first argument to a method is usually reserved for the class or instance object. In Python, we call it CLS or self, but you can call it anything like this(it’s just best not to).

As we saw in the previous section, functions implement objects of the __get__() method, so they are non-data descriptors. It is in Python’s access (call) method support that the called function is bound to a method by calling __get__().

In pure Python, it works like this (the example comes from the descriptor usage guide):

class Function:
    def __get__(self, obj, objtype=None) :
        if obj is None:
            return self
        return types.MethodType(self, obj) Bind a function to a method
Copy the code

In Python 2, there are two methods: unbound method and bound method. In Python 3, there is only the latter.

Bound methods are associated with the class or instance data to which they are bound:

>>> Employee.coo
<bound method Employee.coo of <class '__main__.Employee'> > > > >Employee.foo
<function __main__.Employee.foo(self) > > > >e = Employee('IT'.'bobo') > > >e.foo
<bound method Employee.foo of <Employee: IT-bobo>>
Copy the code

We can access instances and classes from methods:

>>> e.foo.__self__
<Employee: IT-bobo>
>>> e.foo.__self__.__class__
__main__.Employee
Copy the code

Using the descriptor protocol, we can manually bind a function to a method in the outer scope of a class to access data in the class or instance. I’ll use this example to explain how to bind a method (execution) when your object accesses (calls) a function stored in the class dictionary:

The following functions are available:

>>> def f1(self) :
.    if isinstance(self, type) :.        return self.outsource
.    return self.name
...
>>> bound_f1 = f1.__get__(e, Employee) # or bound_f1 = f1.__get__(e)
>>> bound_f1
<bound method f1 of <Employee: IT-bobo>>
>>> bound_f1.__self__
<Employee: IT-bobo>
>>> bound_f1()
'bobo'
Copy the code

To summarize: when we call e.foo(), we first get foo from employee.__dict__ [‘foo’], convert it to a method by calling foo’s foo method foo.__get__(e), and then execute foo() to get the result. This completes e.foo() -> f(e).

If you are confused by my explanation, I recommend that you read the official descriptor usage guide to learn more about the descriptor protocol, and learn more about the function-method binding process in the functions and Methods and Static and class methods section of this article. It is also explained in the Method object section of the Python class article.

__slots__

Python’s object attribute values are stored in dictionaries, and memory consumption can be a problem when dealing with thousands or more instances because dictionary hash tables always create a large amount of memory for each instance. So Python optimizes this problem by providing a __slots__ way to disable instance use of __dict__.

Specifying an attribute by __slots__ changes the storage of the attribute from the instance’s __dict__ to the class’s __dict__ :

class Test:
    __slots__ = ('a'.'b')

    def __init__(self, a, b) :
        self.a = a
        self.b = b
Copy the code
>>> t = Test(1.2)
>>> t.__dict__
AttributeError: 'Test' object has no attribute '__dict__'
>>> Test.__dict__
mappingproxy({'__module__': '__main__'.'__slots__': ('a'.'b'),
              '__init__': <function __main__.Test.__init__(self, a, b)>,
              'a': <member 'a' of 'Test' objects>,
              'b': <member 'b' of 'Test' objects>,
              '__doc__': None})
Copy the code

__slots__ I wrote an article about __slots__ earlier, so if you’re interested, read the Python class attribute __slots__.

supplement

The.__getattribute__ and __getattr__

How is the function’s __get__ method called?

In Python everything is an object, and all objects have a default __getAttribute__ (self, name) method.

This method is called automatically when we use. To access obj’s properties, and to prevent recursive calls, It is always implemented to get Object.__getAttribute__ (self, name) from the base class Object, which mostly defaults to looking up name from self’s __dict__ dictionary (except for special method look-ups).

Extras: If __getattr__ is also implemented, __getAttribute__ is called only if __getAttribute__ is explicitly called or if an AttributeError exception is raised. __getattr__ is implemented by the developer and should either return the attribute value or raise an AttributeError exception.

The descriptor is called by the __getAttribute__ () method, which looks like this:

def __getattribute__(self, key) :
    v = object.__getattribute__(self, key)
    if hasattr(v, '__get__') :return v.__get__(self)
    return v
Copy the code

Note: Overriding __getAttribute__ () prevents the descriptor from being called automatically.

Function attributes

Functions are Python function objects, so they also have arbitrary properties, which can be useful sometimes, for example to implement a simple function call trace decorator:

def calltracker(func) :
    @wraps(func)
    def wrapper(*args, **kwargs) :
        wrapper.calls += 1
        return func(*args, **kwargs)
    wrapper.calls = 0
    return wrapper

@calltracker
def f() :
    return 'f called'
Copy the code
>>> f.calls
0
>>> f()
'f called'
>>> f.calls
1
Copy the code

reference

  • HowFunctionsToMethods
  • Descriptor HowTo Guide
  • Python Data model
  • Understanding internals of Python classes