Python 3.7 has added a standard library decorator called dataclass. It’s very useful and can greatly improve the readability of your code. Most importantly, it saves you a lot of time by making you write a lot less code.

Let’s say you’re writing code for a comment system. You create a new class, define a few member variables, and write magic methods such as init and repr for it, like this:

class Comment:
    def __init__(self, id: int, text: str) :
        self.id: int = id
        self.text: str = text

    def __repr__(self) :
        return "{}(id={}, text={})".format(self.__class__.__name__, self.id, self.text)
Copy the code

In order not to make repeated comments, you write __eq__, __ne__. In order to support comment sorting, you also write __lt__, __gt__, __le__, __ge__. In order for objects to be hashed, you add __hash__.

class Comment:
    def __init__(self, id: int, text: str) :
        self.id: int = id
        self.text: str = text

    def __repr__(self) :
        return "{}(id={}, text={})".format(self.__class__.__name__, self.id, self.text)

    def __eq__(self, other) :
        if other.__class__ is self.__class__:
            return (self.id, self.text) == (other.id, other.text)
        else:
            return NotImplemented

    def __ne__(self, other) :
        result = self.__eq__(other)
        if result is NotImplemented:
            return NotImplemented
        else:
            return not result

    def __hash__(self) :
        return hash((self.__class__, self.id, self.text))

    def __lt__(self, other) :
        if other.__class__ is self.__class__:
            return (self.id, self.text) < (other.id, other.text)
        else:
            return NotImplemented

    def __le__(self, other) :
        if other.__class__ is self.__class__:
            return (self.id, self.text) <= (other.id, other.text)
        else:
            return NotImplemented

    def __gt__(self, other) :
        if other.__class__ is self.__class__:
            return (self.id, self.text) > (other.id, other.text)
        else:
            return NotImplemented

    def __ge__(self, other) :
        if other.__class__ is self.__class__:
            return (self.id, self.text) >= (other.id, other.text)
        else:
            return NotImplemented

Copy the code

Now, you suddenly realize that you need to add another field, the reviewer id: author_id, and then you have to manually add this field in every function. It’s like rewriting a class, and if you forget to add one, it’s a buggy class.

The question is, it’s possible to add or delete fields later, but is there a way to update those methods automatically once I’ve defined the class’s member variables? So I don’t have to change my mind?

Yes, this is dataclass today. With dataclass, that’s all you need:

from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class Comment:
    id: int
    text: str = ""
Copy the code

Fields can be followed by type hints to increase readability.

If you want to add a field author_id, just add it:

from dataclasses import dataclass
@dataclass(frozen=True, order=True)
class Comment:
    id: int
    author_id: int
    text: str = "" Fields with default values should come after
Copy the code

To verify this:

import inspect
from dataclasses import dataclass
from pprint import pprint

@dataclass(frozen=True, order=True)
class Comment:
    id: int
    author_id: int
    text: str = "" Fields with default values should come after

def main() :
    comment = Comment(1.2."I just subscribed!")
    print(comment)
    # frozen = True indicates that this object is immutable and cannot be reassigned after initialization
    # comment.id = 3 # can't immutable
    print(dataclasses.astuple(comment))
    print(dataclasses.asdict(comment))
    If you have to change it, you can
    copy = dataclasses.replace(comment, id=3)
    print(copy)
    pprint(inspect.getmembers(Comment, inspect.isfunction))


if __name__ == '__main__':
    main()
Copy the code

The running result is as follows:

As you can see from the final results above, dataclass automatically wrote many magic methods for us, saving us the trouble of writing them manually. Notice that frozen = True indicates that the object is immutable. After initialization, members cannot be reassigned. This can be applied to fixed objects and immutable configuration information.

Let’s take a look at the official documentation of the function signature:

That is, magic methods like __init__, __repr__, __eq__ are generated for us by default. The corresponding argument is passed True or False to control whether those magic methods are generated automatically, for example:

  • If order = True is passed, it generates__lt__(), __le__(), __gt__(), __ge__() Methods.
  • If eq and frozen are both True, the __hash__ method is generated

If you still have to write these functions yourself, this is fine. For example, when you define __init__(), init = x is ignored.

Don’t want all fields involved?

You already know that dataclass can automatically generate <,=,>,<=, and >= comparison methods. But one drawback of these comparison methods is that they use all the fields in the class to compare, so is there a way to keep some fields out of the comparison? For example, if we have a class that includes name, age, and height, and we don’t want names to be included in the comparison, we can write:

from dataclasses import dataclass,field

@dataclass(order = True)
class User:
    name: str = field(compare = False)
    age: int
    height: float
Copy the code

Also, if you don’t want a field to appear in the REPR, you can specify field(repr = False).

The last word

This article has shared the basic usage of dataclass, which can save us a lot of time writing or modifying code while giving us maximum flexibility without any side effects to the class. Pythoneers are recommended to use it. If it is helpful, please also “like, see, share” support, thanks for reading!

Follow me and learn a Little Python technique every day.