This article is from the “Why Python” series

Not long ago, Python Cat recommended a book called “Smooth Python” (click to skip to it), which was full of praise and was rather vague…

However, Fluent Python is worth coming back for a refresher. I recently stumbled across a slightly weird point in the book, so I’m going to talk about it — can subclassing built-in types be a problem? !

1. What are the built-in types?

Before we get down to business, let’s start with a quick reminder: What are Python’s built-in types?

According to the official documentation, built-in Types mainly include the following:

Detailed documentation: docs.python.org/3/library/s…

Among them, there are well-known numeric types, sequence types, text types, mapping types and so on, and of course there are booleans we introduced earlier,… Objects and so on.

In all of this, this article focuses on built-in types that are callable objects, that is, those that are superficially similar to built-in functions: int, STR, list, tuple, range, set, dict…

These types can be easily understood as classes in other languages, but Python doesn’t use the customary big hump nomenclature here, which can be misleading.

After Python 2.2, these built-in types can be subclassing, which means they can inherit.

Subclassing of built-in types

It is well known that len(x) is a common built-in function in Python for the length of a common object x, unlike objects in object-oriented languages such as Java, which typically have their own x.length() method. (PS: For an analysis of these two design styles, I recommend reading this article.)

Now, suppose we want to define a list class that has its own Length () method while keeping all the features of a regular list.

The experimental code looks like this (for demonstration only) :

Define a subclass of list
class MyList(list) :
    def length(self) :
        return len(self)
Copy the code

Let’s make the MyList custom class inherit from List and define a new length() method. In this way, MyList has append(), pop(), and so on, as well as length().

Add two elements
ss = MyList()
ss.append("Python")
ss.append("Cat")

print(ss.length())   Output: 2
Copy the code

The other built-in types mentioned earlier can also be subclassed in this way, which should make sense.

By the way, what are the benefits/usage scenarios for subclassing built-in types?

An intuitive example is when we need to use a list object frequently in a custom class (add/remove elements to it, pass as a whole…). If our class inherits from list, we can just write self.append(), self.pop(), or pass self as an object without having to define an additional list object.

Are there any other benefits/use scenarios? Welcome to comment ~~

3. “Problems” with subclassing built-in types

Finally, the official topic of this article 🙂

In general, in our textbook knowledge, methods in a subclass override methods of the same name in their parent class, which means that subclass methods have higher lookup priority than parent methods.

Cat and PythonCat both have a say() method that says inner_voice of the current object:

# Python Cat is a cat
class Cat() :
    def say(self) :
        return self.inner_voice()
    def inner_voice(self) :
        return "Meow"
class PythonCat(Cat) :
    def inner_voice(self) :
        return "Meow"
Copy the code

When we create an object that subclasses PythonCat, its say() method takes precedence over the inner_voice() method of its parent Cat class:

my_cat = PythonCat()
# The results below are as expected
print(my_cat.inner_voice()) # Output: meow meow
print(my_cat.say())         # Output: meow meow
Copy the code

This is a convention of programming languages, is a basic principle, students who have learned the basics of object-oriented programming should know.

However, when Python implements inheritance, it does not seem to follow exactly the rules described above. It is divided into two cases:

  • Common sense: Classes implemented in Python follow the “subclass before parent” rule
  • Against common sense: For classes that are actually implemented in C (that is, STR, list, dict, and other built-in types), explicit calls to subclass methods follow the “subclass before parent” principle; However, ** in the case of implicit calls, ** they seem to follow the “parent before child” rule, where the usual inheritance rules fail

In contrast to the PythonCat example, a call to my_cat.inner_voice() gives you the correct meow meow result, but a call to my_cat.say() gives you an unexpected meow result.

Here is an example from Smooth Python (section 12.1) :

class DoppelDict(dict) : 
    def __setitem__(self, key, value) : 
        super().__setitem__(key, [value] * 2)

dd = DoppelDict(one=1)  # {'one': 1}
dd['two'] = 2           # {'one': 1, 'two': [2, 2]}
dd.update(three=3)      # {'three': 3, 'one': 1, 'two': [2, 2]}
Copy the code

In this example, DD [‘two’] calls the subclass’s __setitem__() method directly, so the result is as expected. If other tests are in line with expectations, the end result will be {‘ three ‘: [3, 3],’ one ‘: [1, 1],’ two ‘: [2, 2]}.

However, the __init__() and __update__() methods inherited from the parent class are initialized and updated, respectively, and implicitly called __setitem__() methods. Instead of calling the subclass’s methods, the parent class’s methods are called, resulting in unexpected results!

The official Python approach to implementing double rules is a bit of a violation of common sense, and if you don’t pay attention to it, you can easily get stuck.

So why the exception?

4. The true face of methods of built-in types

We know that built-in types don’t implicitly call methods overridden by subclasses, and then it’s the Python cat’s moment of wonder: Why doesn’t it?

The Smooth Python book doesn’t follow up, but I did try to make a wild guess (which should be verified by the source code) that the methods of the built-in types are implemented in C and don’t actually call each other, so there’s no call priority issue.

In front of that is to say, “__init__ () and __update__ () implicitly invoke __setitem__ () method” that is not accurate!

These magic methods are actually independent of each other! __init__() has its own setitem implementation and does not call __setitem__() of the parent class, and certainly not of the __setitem__() of the subclass.

The dictionary’s __init__() method contains __setitem__(), so we expect the former to call the latter.

The method on the left opens the language interface to the world on the right, where it does all its work and does not go back to the original interface to find the next instruction (i.e., there is no red line path in the figure). The reason for this is simple: code calls between C languages are more efficient, the implementation path is shorter, and the implementation process is simpler.

Similarly, dict get() methods do not call __getitem__(). If a subclass overrides only __getitem__(), the subclass actually uses the parent class’s get() method when it calls get(). (PS: Both Smooth Python and the PyPy documentation are inaccurate on this point, mistakenly assuming that the get() method calls __getitem__().)

That is, methods of Python built-in types have no call relationships themselves, although they may have common logic or methods that can be reused when implemented in the underlying C language.

I was thinking of “Why Does Python Support Arbitrary truth judgments?”, which was analyzed in the “Why Python” series. . When we write if XXX, it seems to implicitly call __bool__() and __len__() magic methods, but in fact the program goes straight into pure C code logic with the POP_JUMP_IF_FALSE directive. There are no calls to these magic methods!

So, once we realize that the special methods implemented in C are independent of each other, we look back at subclassing the built-in types and find something new:

The __init__() magic method of the parent class does its job by breaking the language interface, but there is no path between it and the __setitem__() method of the child class, that is, the red line path in the figure is unreachable.

From this, we come to a different conclusion from the previous article: Python actually follows the “subclass method before superclass method” inheritance principle strictly, and does not violate common sense!

Finally, __missing__() is a special case. Smooth Python is just a simple, vague sentence without much expansion.

After some preliminary experimentation, I found that when subclasses define this method, get() normally returns None when it reads nonexistent keys; But __getitem__() and DD [‘ XXX ‘] both read nonexistent keys as __missing__() as defined by subclasses.

I have no time to go into the analysis, please know the answer to my message.

Best practices for subclassing built-in types

In summary, there is nothing wrong with subclassing built-in types. It is only because we do not recognize the special methods (methods implemented in C) that the results are biased.

This, then, raises a new question: What is the best practice for inheriting built-in types if at all?

First, if you inherit a built-in type and do not overwrite its special methods, there is no problem with subclassing.

Second, if you want to override special methods after inheritance, remember to override all the methods you want to change. For example, if you want to change the get() method, override the get() method, if you want to change the __getitem__() method, override it…

But what if we just wanted to rewrite some logic (that is, the part of C) so that all special methods that use that logic change, such as rewriting __setitem__() logic, while making initialization and update() operations change as well?

We know that there is no reuse between special methods, that is, it is not enough to define a new __setitem__(), so how can multiple methods be affected at the same time?

PyPy, an unofficial version of Python, discovered this problem by making calls to special methods of built-in types, establishing a connection path between them.

Python is aware of this problem, of course, but it doesn’t change the built-in types. Instead, it offers new solutions: UserString, UserList, UserDict…

Except for their names, they can be considered equivalent to built-in types.

The basic logic of these classes is implemented in Python, which is equivalent to moving some of the logic from the PREVIOUS C interface to the Python interface, creating a call chain on the left side, thus eliminating the reuse of some special methods.

Using the new inheritance method, the result is as expected, compared to the previous example:

from collections import UserDict

class DoppelDict(UserDict) :
    def __setitem__(self, key, value) : 
        super().__setitem__(key, [value] * 2)

dd = DoppelDict(one=1)  # {'one': [1, 1]}
dd['two'] = 2           # {'one': [1, 1], 'two': [2, 2]}
dd.update(three=3)      # {'one': [1, 1], 'two': [2, 2], 'three': [3, 3]}
Copy the code

Obviously, the best practice for inheriting STR /list/dict is to inherit the classes provided by the Collections library.

6, summary

Writing so much, it is time to make ending ~~

Following the Python Cat analysis of “Why built-in functions/built-in types are not panacea” in terms of order of lookup and speed, this article also reveals some mysterious and seemingly flawed behavior of built-in types.

Although this article is inspired by the book Smooth Python, we further analyze the principle behind the phenomenon by asking a “why” beyond the linguistic representation.

In short, special methods of built-in types are implemented independently by C. They have no call relationship in the Python interface, so when a built-in type is subclassed, the special methods overridden only affect the method itself, not the effects of other special methods.

If we are wrong about the relationship between particular methods, we might think that Python breaks the basic inheritance principle that subclass methods precede superclass methods. (Unfortunately, both Smooth Python and PyPy have this misconception.)

In keeping with popular expectations of built-in types, Python provides extensions such as UserString, UserList, and UserDict in the standard library to make it easy for programmers to inherit these basic data types.

Last: This article is part of the “Why Python” series, which focuses on the syntax, design, and development of Python. The series tries to show the charm of Python by asking “why” questions. If you have other topics of interest, please fill in the “Python’s 100,000 Whys?” In the survey.

Python cat is a series of articles, including why Python series, Cat philosophy series, Python Advanced series, good books recommended series, technical writing, quality English recommendation and translation, and so on.