In previous articles, I talked about the heuristic approach to using classes in Python.

If you have a function that takes the same arguments, consider using a class.

The problem is, heuristics don’t always work.

To get the most out of them, it helps to know what the exceptions are.

So let’s look at a few real-world examples where functions that take the same parameters don’t necessarily form a class.

Counterexample: Two sets of parameters#

Consider the following situation.

We have a web application for a Feed reader that displays a list of feeds and a list of items (articles), filtered in various ways.

Because we want to do the same thing on the command line, we pull database-specific logic into functions in a separate module. These functions take a database connection and other parameters, query the database, and return results.

def get_entries(db, feed=None, read=None, important=None): ... def get_entry_counts(db, feed=None, read=None, important=None): ... def search_entries(db, query, feed=None, read=None, important=None): ... def get_feeds(db): ...

The main usage patterns are: at the beginning of the program, connect to the database; Call the function repeatedly with the same connection, but different options, based on the user’s input.


Take heuristics to the extreme and this is what we end up with.

`class Storage:

def __init__(self, db, feed=None, read=None, important=None): self._db = db self._feed = feed self._read = read self._important = important def get_entries(self): ... def get_entry_counts(self): ... def search_entries(self, query): ... def get_feeds(self): ... `Copy the code

This isn’t very useful: every time we change options, we need to create a new Storage object (or worse, have a single object and change its properties). Plus, get_feeds() doesn’t even use them — but somehow not using it seems just as bad.

What is missing is a slight difference: instead of having just one set of arguments _, there are two sets of _, and one of them changes more often than the other.

Let’s deal with the obvious problem first.

Database connections change the least frequently, so it makes sense to keep it on storage and pass a storage object around.

`class Storage:

def __init__(self, db): self._db = db def get_entries(self, feed=None, read=None, important=None): ... def get_entry_counts(self, feed=None, read=None, important=None): ... def search_entries(self, query, feed=None, read=None, important=None): ... def get_feeds(self): ... `Copy the code

The most important benefit of doing this is that it abstracts the database from the code that uses it, allowing you to have more than one store.

Want to store entries as files on disk? Write a FileStorage class to read them from there. Want to test your application with various combinations of entries? Write a MockStorage class that places entries in a list in memory. Whoever calls get_entries() or search_entries() need not know or care where the _ entry comes from or how the search is implemented.

This is the design pattern for data access objects. In object-oriented programming terms, DAO provides an abstract interface that encapsulates a persistence mechanism.


Well, the above looks pretty much the same to me — I wouldn’t really change anything else.

Some parameters are still repetitive, but this is useful repetition: once users have learned to filter entries one way, they can do it either way. Also, people use different parameters at different times; From their point of view, it’s not really repetition.

And, anyway, we’re already using a class…

Counterexample: data classes#

Let’s add more requirements.

There’s more to it than storing stuff, and we have multiple users doing it (web applications, CLI, someone using our code as a library). So we make Storage _ only store and wrap it in a Reader object that stores _.

`class Reader:

def __init__(self, storage): self._storage = storage def get_entries(self, feed=None, read=None, important=None): return self._storage.get_entries(feed=feed, read=read, important=important) ... def update_feeds(self): # calls various storage methods multiple times: # get feeds to be retrieved from storage, # store new/modified entries ... `Copy the code

Now, the main caller of storage.get_entries () is reader.get_entries (). In addition, filter parameters are rarely used directly by storage methods; most of the time they are passed to auxiliary functions.

`class Storage:

def get_entries(self, feed=None, read=None, important=None): query = make_get_entries_query(feed=feed, read=read, important=important) ... `Copy the code

Problem: When we add a new entry filter option, we have to change the reader method, the storage method _, and the _ helper. And it’s likely we’ll do it again in the future.

Solutions. Group the parameters in a data-only class.

`from typing import NamedTuple, Optional

class EntryFilterOptions(NamedTuple): feed: Optional[str] = None read: Optional[bool] = None important: Optional[bool] = None

class Storage:

. def get_entries(self, filter_options): query = make_get_entries_query(filter_options) ... def get_entry_counts(self, filter_options): ... def search_entries(self, query, filter_options): ... def get_feeds(self): ... `Copy the code

Now, no matter how many times they’re passed around, there are only two choices that matter.

  • In the reader method, it creates an EntryFilterOptions object
  • They are used either as a helper or as a storage method

Note that although we are using the _ syntax _ of the Python class, EntryFilterOptions_ is not a _ class _ in the sense of traditional object-oriented programming, because it has no behavior. Sometimes these are referred to as “passive data structures” or “plain data”.

A generic class or data class can also be a good choice. Why I chose a named tuple is a discussion for another article.

I used type hints because it’s a cheap way to keep track of options, but you don’t have to do this, even for data classes.

The above example is a simplified version of the code in my feed reader library. In the real world, EntryFilterOptions groups six options, with more on the way, while Reader and Storageget_entries() are a bit more complicated.

Why not a dict?#

Instead of defining a brand new class, we can simply use a dict, such as.

{'feed': ... , 'read': ... , 'important': ... }

But there are many disadvantages.

  • Dict is not type-checking, and TypedDict helps, but still doesn’t prevent _ from using the wrong key at runtime.
  • Dicts doesn’t do code very well, and TypedDict can help smart tools like PyCharm, but not in interactive mode or IPython.
  • Dicts are mutable, and immutability is an advantage for our usage: the options don’t have much reason to change, and it would be very unexpected, so it’s very useful not to allow it to happen.

Why not take**kwargs? #

Since we are talking about dicts, why not let reader.get_entries () and the like accept and pass **kwargs directly to EntryFilterOptions?

This is shorter, but it also destroys completion.

In addition, this makes the code less self-documenting: even if you look at the source code for Reader.get_entries(), you still don’t immediately know what parameters it needs. This is not important for internal code, but for the user-facing parts of the API, we don’t mind making it more verbose if it makes the code easier to use.

Also, if we later introduce another data object (for example, the hang paging option), we still have to write code to split the Kwargs between the two.

Why not take EntryFilterOptions?#

So why not make reader.get_entries() accept an EntryFilterOptions?

This is too cumbersome for users: they have to import EntryFilterOptions, build it, and then pass it to get_entries(). Frankly, it’s not very written.

This difference between reader method and store method signatures exists because they are used differently.

  • Reader methods are primarily invoked in different ways by external users
  • Storage methods are primarily invoked by internal users (readers) in several ways.

That’s all for now.

Learned something new today? Share this article with others, it really helps! 🙂


  1. Ted Kaminski discusses this distinction in more detail in Data, Objects, and How We Are Introduced to Bad Design. [return]