Why large Django projects need a data (prefetching) layer

About This Talk

The main building blocks of Django REST Framework projects, i.e. Views, Serializers, Managers, and Querysets allow developers to implement complex APIs with very little code repetition while reusing built-ins for essential API features. Developers feel guided by DRF to architect the project in a “Don’t Repeat Yourself” way by using inheritance, nesting, annotations, and model / app-based separation of concerns. They can group code in viewsets, inherit from base classes, reuse the same serializer across views, nest serializers into others, compute fields dynamically with ORM annotations, select or prefetch relations for performance, organize custom behavior with managers and querysets, and much more. All this DRYness is great because it integrates well with common web API concerns like permissions, pagination, filters, etc.

Based on our multi-year experience in building and maintaining several large Django projects, while using those built-in concepts really yields a DRY code, the overuse also results in a codebase full of complicated bugs and performance issues, especially related to ORM usage. View, serializer, and model methods are often heavily coupled to querysets’ annotations and prefeches, but those methods are spread across the codebase. Django’s default queryset laziness, together with its heavy usage of inheritance and nesting is the perfect recipe for a codebase where N+1 issues and heavy unnecessary queries can happen in any line of code after some less careful change.

For example, to prevent N+1 issues, if a serializer method field uses a filtered relationship, you must ensure this relationship is prefetched in all querysets related to that serializer. But this serializer can be nested into others, so you must now be careful to change all queryset references in seemingly unrelated views. Other sorts of “change amplification” situations also happen on large DRF codebases with heavy ORM usage. Requiring developers to be careful while navigating through lots of files to perform changes isn’t reasonable. Maybe being DRY is leading to the wrong abstraction?

It’s possible to design a better architecture that’s optimized both for enabling changes and avoiding performance regressions. With a new custom data prefetching layer that keeps compatibility with serializers and views, we can respect DRY while keeping performance and maintainability. That’s what we’ve been doing in our Django projects, and we will share our learnings in this talk. Hopefully, that applies to other maintainers of complex DRF projects.

Here’s the planned outline:

Django REST Framework is DRY: the good side
Trade-offs of DRY in DRF: when reusing serializers leads to Change Amplification on prefetches and annotations
How view querysets and serializers are coupled
How vanilla Django suffers from the same issues
Incomplete solutions that weren’t enough for us
Solving with a data prefetching layer: django-virtual-models
- Warn programmers during dev time about missing prefetches and annotations for each serializer
- Automatically run necessary prefetches and annotations
- Automatically prevent unnecessary prefetches and annotations
- Keep serializer nesting support
- Keep SerializerMethodField support
Other solutions, including django-readers

Presenters

Flávio Juvenal (he/him)

I’m a software engineer from Brazil and Chief Scientist at Vinta Software (www.vinta.com.br). I’ve been building web products with Python and Django for the last 12 years. I love drinking medium and light roast coffee and visiting museums around the world.