Graphql Dataloader Composition

March 12, 2021

Dataloaders are neat tools that help cache and batch IO operations. They’re a great fit for graphQL’s execution style. I won’t fully explain how they work or how to use them, others have done a great job of explaining it. Instead, I’d like to spread a neat pattern to make resolvers and dataloaders cleaner. It also unlocks the ability for dataloaders to call each other.

Once you can realize you can compose dataloaders, you can fully embrace a fat dataloaders, skinny resolvers approach.

This particular example will use python, django and graphene, but the general pattern should extend to other languages. There’s a good chance other languages default to this pattern.

The popular way to use dataloaders

Here’s a quick recap of most dataloader tutorials:

1. create a dataloader class

# .../dataloaders.py
from promise.dataloader import DataLoader

class PersonByIdLoader(Dataloader):
    def batch_load_fn(self,person_ids):
        people = get_people(person_ids)
        by_id = { p.id: p for p in people }
        
        # graphene requires batch_load_fn to return a promise
        return Promise.resolve([ by_id.get[p_id] for p_id in person_ids ])

2. Attach dataloader instances to the graphql context

There are definitely some variations on how to instantiate dataloaders, they can be instantiated in middleware, in the root resolver, or in the context object itself.

from dataloaders import PersonByIdLoader
# GQLContext
class GQLContext:
    def __init__(self, request):
        self.request = request
        self.dataloaders = {
          "person_by_id_loader": PersonByIdLoader()
        }

class CustomGraphQLView(GraphQLView):
  # ...
    def get_context(self, request):
        return GQLContext(request)

3. make resolvers access dataloaders through the context

# .../types/person.py
import graphene

class Person(graphene.ObjectType):
    parent = graphene.Field(Person)

    @staticmethod
    def resolve_parent(person,info):
        return info.context.dataloaders['person_by_id_loader'].load(person.parent_id)

The problem with this approach

First, this pattern has a couple of mild annoyances:

You have to maintain the dictionary of dataloader instances.
You have to name dataloaders in order for the dictionary to index them. You’ve already named the classes, naming the instances feels duplicative.

Forgetting to add dataloaders on the context is a common source of bugs. You can mitigate this with a stronger typed approach, but you’re still stuck maintaining a list and naming the instances.

These problems, though annoying, are all fairly cosmetic. There is, however, a bigger disadvantage to this entire approach. Because dataloaders can’t access each other loader instances, dataloaders can’t call each other

Dataloader composition might not come to mind as a problem worth solving, but the alternative is repeating logic in resolvers or in other dataloaders. Putting too much logic in resolvers is a graphQL antipattern. Since resolvers can’t (or shouldn’t) call other resolvers, their logic can’t be re-used.

Our end goal is to for dataloaders to be able to access (and when applicable, create) instances of other dataloaders.

Design patterns to the rescue

A class implements the singleton pattern if its constructor always returns the same instance.

NormalClass() is NormalClass() # False
SingletonClass() is SingletonClass() # True, same object!

In the case of graphql, to allow caching and batching, we need to ensure resolvers all use the same instance of a dataloader. If dataloaders were singletons, however, we wouldn’t need to make sure we call the same instances, we could import dataloader classes and instantiate them in our resolvers.

def resolve_parent(person,info):
    return PersonByIdLoader().load(person.parent_id)

Using singletons is going a little too far, though. Graphql often runs on multiple threads or over an event loop. There’s a reason dataloader libraries force you to create instances. Garbage-collection of dataloaders is also an easy way to handle resetting their caches and its promises, so it makes sense to instantiate them for a query’s lifecycle.

There should be a way to ensure we only use a single instance per request, though. There’s no reason the following couldn’t work:

def resolve_parent(person,info):
    return PersonByIdLoader.get_instance(info.context.dataloaders)

We just need to ensure get_instance() returns the same instance when given the same arguments. Fortunately, it’s quite simple to implement using a factory pattern:

class PersonByIdLoader(Dataloader):
    # ... batch_load 
    @classmethod
    def get_instance(cls,instance_cache:Mapping[type,Dataloader])->Dataloader:
        if cls not in instance_cache:
            instance = cls()
            context.instance_cache[cls] = instance

        return instance_cache[cls]

Since python allows using classes as keys in a dictionary, it’s the most obvious way to index the instances. In this case, instance_cache can just be a dictionary. Our graphql context just got a lot simpler, no need to instantiate all possible dataloaders up front!

class GQLContext:
    def __init__(self, request):
        self.request = request
        # dataloader classes will automatically populate this dictionary with their instances
        self.dataloaders = {}

Since dataloaders are going to be all over our codebase, though, it’d be nice to make the API even more concise. With a little magic I’m calling the semi-singleton pattern. The same way the standard singleton pattern is syntactic sugar for a specific kind of factory, the semi-singleton pattern is syntactic sugar for dataloader factories:

def resolve_parent(person,info):
    return PersonByIdLoader(info.context.dataloaders).load(person.parent_id)

This is a bit of a mouthful to implement, but we need only apply it once, on the base class

from promise.dataloader import DataLoader


class SemiSingletonDataLoader(BaseDataloader):
    _instance_cache = None

    def __new__(cls, instance_cache):
        if cls not in instance_cache:
            instance_cache[cls] = super().__new__(cls)
            return instance_cache[cls]

    def __init__(self, instance_cache):
        if self._instance_cache is None:
            self._instance_cache = instance_cache
            super().__init__()

This SemiSingletonDataLoader tool can then become the base class of all your dataloaders:

# .../dataloader.py
class PersonByIdLoader(SemiSingletonDataLoader):
    def batch_load_fn(self,person_ids):
        # ...

We’ve now enabled dataloader composition! The following GrandparentLoader can use our existing PersonByIdLoader

class GrandparentLoader(SemiSingletonDataloader):
    def _get_grandparent_for_single_person(self,person_id):
        person_loader = PersonByIdLoader(self._instance_cache)
        parent_prom = person_loader.load(person_id)

        def handle_parent(parent):
              return person_loader.load(parent.parent_id)

        return parent_prom.then(handle_parent)
        
    def batch_load_fn(self,person_ids):
        return Promise.all([
            self._get_grandparent_for_single_person(id)
            for id in person_ids
        ])

If that looks underwhelming, it’s because dealing with promises in python is extremely ugly! In my other post, we see how we can turn the above into something much cleaner:

class GrandparentLoader(Dataloader):
      @genfunc_to_promise
      def _get_grandparent_for_single_person(self,person_id):
          parent = yield PersonByIdLoader(info.context.dataloaders).load(person_id)
          return ParentLoader(info.context.dataloaders).load(parent.parent_id)
          
      def batch_load_fn(self,person_ids):
          return Promise.all([
            self._get_grandparent_for_single_person(id) 
            for id in person_ids
          ])

Long live fat dataloaders, skinny resolvers!

By the way, this pattern also works to create more dynamic dataloaders, like those generated from ORM classes. This is useful for trivial dataloaders that just fetch records by their primary keys,

def resolve_user(parent, info)
    UserLoaderClass = PrimaryKeyLoaderFactory.get_loader_class(UserModel)
    user_loader = UserLoaderClass(info.context.dataloaders)
    return user_loader.load(parent.user_id)

I should note that pattern is far from original. I found this example in saleor’s source code. I’m not sure if semi-singleton is the best name for this pattern. The singleton part is just a hacky shorthand for a factory. The important part is that the factory attaches (conditionally-created) instances onto its context argument, which is a form of dependency injection.