☜

Implementing Phantom types in Python

The NewType construct of the Python standard library is a great little nugget. It allows turning simple mistakes, such as mixing up the values of two fields, into static type errors. Here's a simple example showing how it can shine is using to distinguish ID values of distinct entities.

from dataclasses import dataclass
from typing import NewType

BookId = NewType("BookId", int)
AuthorId = NewType("AuthorId", int)

@dataclass
class Author:
    id: AuthorId
    name: str

@dataclass
class Book:
    id: BookId
    name: str

def lookup_author(author_id: AuthorId) -> Author:
    ...

def lookup_book(book_id: BookId) -> Book:
    ...

book = lookup_book(BookId(1))
# This is a type error!
author = lookup_author(book.id)

Running the mypy type checker on this example gives us a helpful error message, neat!

newtype.py:22: error: Argument 1 to "lookup_author" has incompatible type "BookId"; expected "AuthorId"  [arg-type]
    author = lookup_author(book.id)
                           ^
Found 1 error in 1 file (checked 1 source file)

What else can we use NewType for?

Another common pitfall of Python is to accidentally use timezone-naive datetime objects. Can NewType help us eliminate these usages as well? Let's see what we can do. We start by introducing a utility function so that we can easily produce timezone-aware datetime values.

import datetime

def now_aware() -> datetime.datetime:
    return datetime.datetime.now(tz=datetime.timezone.utc)

Giving this function the NewType-treatment, we end up with a combination of a type, and a function that returns instances of that type.

import datetime
from typing import NewType

TZAware = NewType("TZAware", datetime.datetime)

def now_aware() -> TZAware:
    return TZAware(datetime.datetime.now(tz=datetime.timezone.utc))

This allows us to start using this type in our codebase, and discover potential usages of timezone-naive values.

from datetime import datetime

def is_fortunate_future(from_date: TZAware) -> bool:
    ...

# This is now a type error!
if is_fortunate_future(datetime.now()):
    print("Great things ahead of you!")
# But using our utility function as input passes.
if is_fortunate_future(now_aware()):
    print("Great things ahead of you!")

Running mypy on the above gives us the type error we expect.

tzaware.py:14: error: Argument 1 to "is_fortunate_future" has incompatible type "datetime"; expected "TZAware"
[arg-type]
    if is_fortunate_future(datetime.datetime.now()):
                           ^
Found 1 error in 1 file (checked 1 source file)

But what about testing existing values?

We've now looked at ways of producing values that has a NewType type, and how that can help eliminate two common mistakes. But what if our program receives input that we need to verify conforms to the restrictions we expect our NewType type to have? Spreading out assertion checks and "instantiations" of the type over the code-base wouldn't very maintainable. What if the definition changes, and we want to further restrict the type?

The standard library provides the TypeGuard construct that we can use to introduce another utility function to check if a value conforms to our restrictions.

import datetime
from typing import NewType, TypeGuard

TZAware = NewType("TZAware", datetime.datetime)

def now_aware() -> TZAware:
    return TZAware(datetime.datetime.now(tz=datetime.timezone.utc))

def is_tz_aware(dt: datetime.datetime) -> TypeGuard[TZAware]:
    # https://docs.python.org/3/library/datetime.html#determining-if-an-object-is-aware-or-naive
    return dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None

Using the new utility function, we can now also call our business logic with values that we haven't instantiated ourselves, and have the type checker understand that we've validated them.

from datetime import datetime

def is_fortunate_future(from_date: TZAware) -> bool:
    ...

def get_date() -> datetime:
    ...

dt = get_date()

# As expected, this gives a type error.
is_fortunate_future(dt)

# But using our new utility function, we can make the type checker certain that the
# value is timezone aware.
if not is_tz_aware(dt):
    raise RuntimeError(f"Invalid datetime value: {dt}")

# Hence, mypy now considers this safe, great!
if is_fortunate_future(dt):
    print("Great things ahead of you!")

What's the catch?

The above technique works great, but it has two big limitations.

The first issue is that NewType types aren't compatible with isinstance() checks. The TypeGuard trick partly addresses this, in that it provides a way to narrow arbitrary objects, but since there is no introspectable connection between the type and the TypeGuard-returning function, there is no way for runtime type checkers to implement compatibility with this. This in turn means that these types aren't compatible with Pydantic and other similar tools.

The second issue is that there is no way to control where NewType types are instantiated. Ideally we'd like to be able to make it so that now_aware() and is_tz_aware() functions from the previous example are the only ways to produce a valid TZAware value. In reality though, there is nothing stopping us from importing the TZAware type in another module and instantiating it with an invalid value. This is obviously problematic, as it means the type isn't really providing any guarantees for program correctness, really only a best effort.

So, is there a better approach?

Python exposes many interfaces for meta programming, allowing us to define object behavior on a very granular level. One of these interfaces, that is interesting for our use case, is the __instancecheck__ method. By implementing for a class we can define when an isinstance check with that class returns True. Because existing static type checkers are all compatible with isinstance, this allows us to define types similar to the NewType objects we've explored, but with stricter semantics for instantiation. Let's attempt to implement TZAware with this new methodology. It gets a little hairy, but by going through this exercise we'll reach something more approachable in the end.

import abc
import datetime

# __instance_check__ is only available on metaclasses.
class TZAwareMeta(abc.ABCMeta):
    def __instancecheck__(self, instance: object) -> bool:
        # This is equivalent to the is_tz_aware() function from the previous example.
        # We're saying that, in order for an object to be considered an instance of this
        # type, it must be an instance of datetime.datetime AND have appropriate
        # timezone data.
        return (
            isinstance(instance, datetime.datetime)
            and instance.tzinfo is not None
            and instance.tzinfo.utcoffset(instance) is not None
        )

    # By overriding __call__ we can make sure instantiation is only allowed with valid
    # values.
    def __call__(cls, value):
        if not isinstance(value, TZAware):
            raise ValueError(
                f"TZAware must be instantiated with a timezone-aware datetime object, "
                f"got object of type {type(value)}."
            )
        return value


class TZAware(datetime.datetime, metaclass=TZAwareMeta):
    # This method will never be called at runtime, but is needed for correct typing.
    def __init__(self, value: datetime.datetime) -> None:
        ...

That's quite a lot of code, so what have we achieved?

We can now use our TZAware type in isinstance checks:

def is_fortunate_future(from_date: TZAware) -> bool:
    ...

def get_date() -> datetime:
    ...

dt = get_date()

if not isinstance(dt, TZAware):
    raise RuntimeError(f"Invalid datetime value: {dt}")

if is_fortunate_future(dt):
    print("Great things ahead of you!")

And if we try to instantiate TZAware with a value that doesn't have timezone data, we get a runtime error:

>>> TZAware(datetime.datetime.now())
Traceback (most recent call last):
  ...
ValueError: TZAware must be instantiated with a timezone-aware datetime object ...

In most aspects this implementation of TZAware behaves much like the NewType type we started out with, but we've now addressed both the issue of control and the issue with isinstance checks. Defining types like this, using special metaclasses isn't exactly nice though, the complexity is likely to be hard to maintain, and it's generally an undertaking that isn't proportional in size to what we want to achieve.

Is there a better way?

This problem is what the phantom-types library tries to address. It provides an interface for creating phantom types where all you need to do is to pass it a predicate function that decides which values are valid.

Using the phantom-types library, let's make yet another attempt to make an equivalent implementation of TZAware.

import datetime
from phantom import Phantom

def is_tz_aware(dt: datetime.datetime) -> bool:
    return dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None

class TZAware(datetime.datetime, Phantom, predicate=is_tz_aware):
    ...

In just four lines of readable and idiomatic code, we were able to produce an equivalent phantom type that fulfills the requirements we set out with. The phantom-types library also allows us to give parsing abilities to types, as well as compatibility with exposing schemas using Pydantic and FastAPI. In fact the library ships a TZAware type that already implements these things out of the box.

We've mostly used timezone-aware datetime objects as example in this article, but there is nothing limiting phantom types to that, we can use them to create narrower subtypes of any other type that is immutable, such as int, str, and tuple. The library comes with other facilities to help in creating those types, as well as other ready-to use type implementations.

I hope this article helps in shedding light on how you can use phantom types in Python, either by using NewType in conjunction with TypeGuard, or by using the phantom-types library to achieve richer and stricter types. The documentation is a good resource if you want to find out more.