Python

Clean Code with Python Dataclasses

Introduction to Python Dataclasses

Table of Contents
  1. Using dataclasses instead of dictionaries
    1. Reason to use dataclasses
  2. Constructing a dataclass from a dictionary
    1. Real Example
  3. Enforcing types
    1. What's the point of enforcing types
    2. The __post_init__ method
    3. Implementing type checking
  4. That's all folks

Dataclasses were added to Python 3.7. They are like regular classes but have some essential functions implemented. Because dataclasses are a decorator, you can quickly create a class, for example

python
1from dataclasses import dataclass
2
3@dataclass
4class Person:
5 name: str
6 age: int

We can then create Bob the Builder with:

python
1bobthebuilder = Person(name="Bob the Builder", age=33)
2
3print(bobthebuilder)
4# Prints: Person(name="Bob the Builder", age=33)

One great thing about dataclasses is that you can create one and use the class attributes if you want a specific thing. For example, let's say that we want to know Bob's age

python
1print(bobthebuilder.age)
2# Prints: 33

Another brilliant thing about dataclasses is that you don't need to implement a class by hand since, by default, dataclasses implement the methods __init__, __repr__ and __eq__ under the hood.

You can also enable/disable this behaviour by passing arguments to the decorator. Refer to the dataclass documentation for more information.

Using dataclasses instead of dictionaries

Let's assume that you are interacting with an API. When you query an endpoint, you might get a dictionary back (usually JSON is turned into a dictionary in Python).

Providing that the response is always the same, you could use the dictionary to get values from the payload. Or create a Payload dataclass and instantiate the dataclass from the dictionary.

python
1from dataclasses import dataclass
2import requests
3
4@dataclass
5class Book:
6 author: str
7 title: str
8 isbn: str
9 pages: int
10 publisher: str
11 rating: int
12
13
14result = requests.get("http://fake-book-api/the-pragmatic-programmer")
15payload = result.json()
16
17"""
18Payload is:
19{
20 "author": "David Thomas, Andrew Hunt",
21 "title": "The Pragmatic Programmer",
22 "isbn": 0135957052,
23 "pages": 352,
24 "publisher": "Addison-Wesley Professional",
25 "rating": 5
26}
27"""

Let's assume that you want to get both the author and rating from your payload. You can easily do such with:

python
1author = payload.get("author")
2rating = payload.get("rating")

Alternatively, you can use the dataclass payload like such

python
1book = Book(
2 author=payload.get("author"),
3 title=payload.get("title"),
4 isbn=payload.get("isbn"),
5 pages=payload.get("pages"),
6 publisher=payload.get("publisher"),
7 rating=payload.get("rating")
8 )
9
10print(book.author)
11# Prints: David Thomas, Andrew Hunt

Reason to use dataclasses

Perhaps using a dictionary is enough when you are dealing with an API that returns you the same structure. But what if you are interacting with more than one API. Or if your API returns a different payload depending on the endpoint, you are querying?

Let's assume that you are now using these two APIs because one has books that the other doesn't:

  • http://fake-book-api/
  • http://awesome-fake-book-api/

We have seen the payload that fake-book-api returns, but just for reference:

json
1{
2 "author": "David Thomas, Andrew Hunt",
3 "title": "The Pragmatic Programmer",
4 "isbn": "0135957052",
5 "pages": 352,
6 "publisher": "Addison-Wesley Professional",
7 "rating": 5
8}

The new api awesome-fake-book-api returns the payload like:

json
1{
2 "author-name": "Robert C. Martin",
3 "title": "Clean Code: A Handbook of Agile Software Craftsmanship",
4 "isbn-10": "9780132350884",
5 "isbn-13": "978-0132350884",
6 "pages": 464,
7 "weight": "1.54 pounds",
8 "ratings": {
9 "total": 4,
10 "reviews": [
11 {"username": "Bob", "rating": 4, "comment": "Is good!"}
12 ]
13 }
14}

This new API doesn't return information about thepublisher. The author key is called author-name. The payload also contains values for both the isbn-10 and isbn-13. Finally, ratings are now a dictionary that contains total and reviews.

Still, you can refactor the Book dataclass to handle both APIS. For example:

python
1book = Book(
2 author=payload.get("author") or payload.get("author-name"),
3 title=payload.get("title"),
4 isbn=payload.get("isbn"),
5 pages=payload.get("pages"),
6 publisher=payload.get("publisher", "Unknown"),
7 rating=payload.get("rating") or payload.get("ratings", {}).get("total")
8)
9
10print(book.author)
11# Prints: David Thomas, Andrew Hunt or Robert C. Martin

Okay, that looks a bit ugly, but at least you can keep the dictionary juggling in a single place, and in the rest of your code, you can access the Book attributes.

Constructing a dataclass from a dictionary

Let's be honest. Constructing the Book dataclass as we did before is ugly, but it does the trick. Ideally, we should pass the payload to the dataclass, and the attributes would be filled automatically.

But if you try to pass the payload into the dataclass you get a TypeError exception.

python
1payload = {
2 "author": "David Thomas, Andrew Hunt",
3 "title": "The Pragmatic Programmer",
4 "isbn": 135957052,
5 "pages": 352,
6 "publisher": "Addison-Wesley Professional",
7 "rating": 5
8}
9
10book = Book(payload)
11# Raises: TypeError: __init__() missing 5 required positional arguments: 'title', 'isbn', 'pages', 'publisher', and 'ratings'

This exception isn't surprising. If we pass the payload to the dataclass, then the whole dictionary is added to the author attribute.

If we want to construct our dataclass from a dictionary, we need to add a class method and use that method instead. Let's do that now and add a from_payload method to our dataclass.

python
1@dataclass
2class Book:
3 author: str
4 title: str
5 isbn: str
6 pages: int
7 publisher: str
8 rating: int
9
10 @classmethod
11 def from_payload(cls, payload: dict):
12 """Construct the Book class from a dictionary."""
13 author=payload.get("author") or payload.get("author-name")
14 title=payload.get("title")
15 isbn=payload.get("isbn") or payload.get("isbn-10")
16 pages=payload.get("pages"),
17 publisher=payload.get("publisher", "Unknown"),
18 rating=payload.get("rating") or payload.get("ratings", {}).get("total")
19
20 return cls(
21 author=author,
22 title=title,
23 isbn=isbn,
24 pages=pages,
25 publisher=publisher,
26 rating=rating
27 )

The brilliant thing is that we can call the from_payload method to construct our Book. The logic is part of the dataclass, which will make our code cleaner. For example:

python
1payload = {
2 "author": "David Thomas, Andrew Hunt",
3 "title": "The Pragmatic Programmer",
4 "isbn": 135957052,
5 "pages": 352,
6 "publisher": "Addison-Wesley Professional",
7 "rating": 5
8}
9
10book = Book.from_payload(payload)
11
12print(book.title)
13# Prints: The Pragmatic Programmer

Another great thing about dataclasses is that you can use the dataclasses.asdict method to get a dictionary back from a dataclass. For example:

python
1import dataclasses
2
3payload = {
4 "author": "David Thomas, Andrew Hunt",
5 "title": "The Pragmatic Programmer",
6 "isbn": 135957052,
7 "pages": 352,
8 "publisher": "Addison-Wesley Professional",
9 "rating": 5
10}
11
12book = Book.from_payload(payload)
13
14print(dataclasses.asdict(book))
15# Prints: {'author': 'David Thomas, Andrew Hunt', 'title': 'The Pragmatic Programmer', 'isbn': 135957052, 'pages': 352, 'publisher': 'Addison-Wesley Professional', 'rating': 5}
16
17dataclasses.asdict(book) == payload # This is True

Real Example

I have recently added a Gitlab Connector to Opsdroid. Opsdroid allows you to emit events based on actions - in GitLab case; these actions come from webhooks.

GitLab webhooks have different structures depending on the action. For example, the payload might have three keys for a username - user, username, user_username.

Instead of dealing with the different payloads for each event that Opsdroid builds for the GitLab connector, I've created a Payload dataclass and kept the logic inside a from_dict method.

Would you please look at the code for reference and for a working example of what I have shown in here?

Enforcing types

When creating a dataclass, you specify the attribute type, but the dataclass itself doesn't enforce these types. Let's grab our Book example, you could totally pass any type to any of the attributes, and all would be well.

python
1@dataclass
2class Book:
3 author: str
4 title: str
5 isbn: str
6 pages: int
7 publisher: str
8 rating: int
9
10book = Book(author="Bob", title=12345, isbn=1234, pages="a million!", publisher=None, rating="Magnific")
11
12print(book.title)
13# Prints: 12345

As you can see, typing isn't enforced. This is okay, providing that you are 100% certain what types you are putting in the dataclass. You can also implement a typing check in the from_payload to ensure that the types are correct, although if someone builds Book directly, that person could bypass your type checking.

What's the point of enforcing types

You might wonder why you would want to enforce typing in your dataclass. Typing helps you write better and less buggy code because your editor will warn you about issues before you even spot them.

It's also a great way to handle user input. For example, you could build a Payload dataclass to handle input from an API that users can interact. I did precisely this on a feature that I implemented for Opsdroid where users can submit a patch request to the API to update their configuration on the fly.

The __post_init__ method

To add type checking, we need to use the __post_init__() method from the dataclass. If your dataclass generates a __init__() method under the hood, then __post_init__() will be called.

The __post_init__ allows you to do a few things, such as initializing field values that depend on one or more fields or that you need to add a default value. Let's assume that you want to create a dataclass with default values.

python
1import dataclasses
2
3@dataclasses.dataclass
4class Book:
5 author: str
6 title: str
7 isbn: str
8 pages: int
9 publisher: str
10 rating: dict = dataclasses.field(default_factory=dict)
11
12 def __post_init__(self):
13 if not self.rating:
14 self.rating = {"total": 0, "reviews": []}
15
16book = Book("Robert C. Martin", "Clean Code", "9780132350884", 464, None)
17
18Print(book)
19# Prints: Book(author='Robert C. Martin', title='Clean Code', isbn='9780132350884', pages=464, publisher=None, rating={'total': 0, 'reviews': []})

You need to set dataclasses.field because dictionaries are mutable, and you can't use a mutable default value.

You can also see that with the code in the __post_init__, we only set self.rating if this value doesn't exist. If the user provides the value then we aren't setting the default value of {"total": 0, "reviews": []}.

There is much more than I can say about dataclasses fields, but I will leave that topic for another article.

Implementing type checking

To implement type checking for our Book dataclass we need to play with dunder methods. We need to look into __annotations__ and also __dict__. For reference, let's pick up the previous example and see what __annotations__ and __dict__ gives us.

python
1book = Book("Robert C. Martin", "Clean Code", "9780132350884", 464, None)
2
3print(book.__annotations__)
4
5# Prints: {'author': <class 'str'>, 'title': <class 'str'>, 'isbn': <class 'str'>, 'pages': <class 'int'>, 'publisher': <class 'str'>, 'rating': <class 'dict'>}
6
7print(book.__dict__)
8# Prints: {'author': 'Robert C. Martin', 'title': 'Clean Code', 'isbn': '9780132350884', 'pages': 464, 'publisher': None, 'rating': {'total': 0, 'reviews': []}}

Since we have both the key and the expected type for each key in __annotations__, we can use that to check if the provided key is the expected type or not. Let's now build our __post__init()

python
1import dataclasses
2
3@dataclasses.dataclass
4class Book:
5 author: str
6 title: str
7 isbn: str
8 pages: int
9 publisher: str
10 rating: dict = dataclasses.field(default_factory=dict)
11
12 def __post_init__(self):
13 if not self.rating:
14 self.rating = {"total": 0, "reviews": []}
15
16 for name, field_type in self.__annotations__.items():
17 provided_key = self.__dict__[name]
18
19 if not isinstance(provided_key, field_type):
20 raise TypeError(
21 f"The field '{name}' is of type '{type(provided_key)}', but "
22 f"should be of type '{field_type}' instead."
23 )

For reference, our for loop will go through each attribute (author, title, isbn, pages, publisher, rating) and expected type. Then we confirm that the user's value with self.__dict__[name] is of the expected type. If not, we raise a TypeError.

Now, if we create a book instance with the wrong type, we get the TypeError.

python
1book = Book("Robert C. Martin", "Clean Code", 9780132350884, 464, None)
2
3# Raises TypeError: The field 'isbn' is of type '<class 'int'>', but should be of type '<class 'str'>' instead.

You probably noticed that the code failed on the first type check. But we actually have two wrong types for isbn and publisher. If you wish you could add some logic to let the for loop go through all the fields and raise the TypeError at the end.

Note: You can only use this method for exact single types. If you are checking for Optional or Union types, then the check isinstance will raise a TypeError: Subscripted generics cannot be used with class and instance checks

For example:

python
1import dataclasses
2from typing import Optional
3
4@dataclasses.dataclass
5class Book:
6 author: str
7 title: str
8 isbn: str
9 pages: int
10 publisher: Optional[str] # This breaks our type checking
11 rating: dict = dataclasses.field(default_factory=dict)
12
13 def __post_init__(self):
14 if not self.rating:
15 self.rating = {"total": 0, "reviews": []}
16
17 for name, field_type in self.__annotations__.items():
18 provided_key = self.__dict__[name]
19
20 if not isinstance(provided_key, field_type):
21 raise TypeError(
22 f"The field '{name}' is of type '{type(provided_key)}', but "
23 f"should be of type '{field_type}' instead."
24 )

So how do we fix this? If we want to set publisher as optional? We can do more dunder magic!

python
1import dataclasses
2from typing import Optional
3
4@dataclasses.dataclass
5class Book:
6 author: str
7 title: str
8 isbn: str
9 pages: int
10 publisher: Optional[str]
11 rating: dict = dataclasses.field(default_factory=dict)
12
13 def __post_init__(self):
14 if not self.rating:
15 self.rating = {"total": 0, "reviews": []}
16
17 for name, field_type in self.__annotations__.items():
18 provided_key = self.__dict__[name]
19
20 try:
21 type_matches = isinstance(provided_key, field_type)
22 except TypeError:
23 type_matches = isinstance(provided_key, field_type.__args__)
24
25 if not type_matches:
26 raise TypeError(
27 f"The field '{name}' is of type '{type(provided_key)}', but "
28 f"should be of type '{field_type}' instead."
29 )
30
31# Now this works!
32book = Book("Robert C. Martin", "Clean Code", 9780132350884, 464, None)

That's all folks

I hope this long article helped you have a better understanding of dataclasses and how to use them. I enjoy using dataclasses daily, especially when interacting with APIs, since dataclasses allow me to make my code cleaner without the need to implement an entire class by hand.

Please let me know what you think and your use-case for dataclasses!

Webmentions

0 Like 0 Comment

You might also like these

While working on adding tests to Pyscript I came across a use case where I had to check if an example image is always generated the same.

Read More
Python

How to compare two images using NumPy

How to compare two images using NumPy

How to return an attribute from a many-to-many object relationship from a Django Ninja API endpoint.

Read More
Python

Django Ninja Schemas and Many To Many

Django Ninja Schemas and Many To Many

Learn what additional permissions you need to add to your user to get django to run tests with a postgresql database.

Read More
Python

Fix django postgresql permissions denied on tests

Fix django postgresql permissions denied on tests

Dask gives you a dashboard when you create a Dask client, in this article I'll share with you how you can install the Dask extension on a JupyterLab notebook.

Read More
Python

Dask Dashboard Inside a Notebook

Dask Dashboard Inside a Notebook