Project structure

Apps¶

In the early days of Django, a lot was made of "pluggability": the idea of reusing chunks of behaviour across multiple projects, in order to (hopefully) make it quicker to build things. For this reason, the concept of the application (or "app") was introduced.

Quote

The term application describes a Python package that provides some set of features. Applications may be reused in various projects.

— Django documentation

Django itself uses the "app" concept to encapsulate its various included batteries: its auth system is an app, the automatic admin interface is an app, and so on. Some of these apps have dependencies on other apps, but fundamentally you can choose whether or not to include them in your project.

A vibrant ecosystem of third-party apps also sprung up. The idea was that you could "assemble" a working project (a blog, say) by installing and wiring together a bunch of apps: admin, auth, a basic content management system, maybe a comments app, RSS feeds, and so on. This mostly works well for something straightforward and well-understood, like a blog.

An important thing to note about apps is that they are intended to be a vertical slice of functionality. An app often includes models, views, URLs, forms (or serializers), templates, context managers, template tags, static files, management commands, and so on. These work together to implement a bounded subset of features within the wider product, and Django expects them to be laid out in a certain way in order to discover what functionality they provide.

All of the above makes sense if you are careful to keep in mind the idea of a reusable app. Take the Django admin as an example: there's nothing about the admin that's specific to a particular business domain, and so it makes sense to encapsulate all of the functionality of the admin in a neat package that can be lifted out of one project and dropped into a completely different one.

Now, here's the rub. It has become a community convention to divide up all of the business logic of a Django project into "apps" within the main project package, including the non-reusable (project-specific) code. This approach is also implied and assumed by the Django docs: "[the project's root directory] is usually the container for all of a project’s applications which aren’t installed separately" (emphasis mine).

There are several problems with this idea.

Inflexibility¶

Slicing up your project into apps is something that must be done early, often at the very start of development. This is a simple result of the fact that you need somewhere to put the code you're writing as you go along. In the early days and weeks of work on a new codebase, manage.py startapp gets used a lot, as the high-level structure of the project starts to take shape.

The issue here is that at this early stage, you often don't really know enough about what the project's final form will look like to correctly draw the boundaries around the apps. Functionality that feels separate at first often becomes deeply entangled, and features that sound similar end up sharing little. Over time, the key concepts and models in your system become clear, and if these are colocated with unrelated or irrelevant code, the waters of the project become muddy and maintenance becomes difficult - before the project is even in production!

So why not just refactor and move the code around until it's in the right place?

To change a model, you need a migration. And, like everything else in Django, migrations are encapsulated within apps! So to move a model from app A to app B you somehow need to move its migrations too, and/or leave vestigial migrations in the old place, lurking forever and causing confusion to your colleagues (or your future self).

While it is technically possible to migrate models between apps, Django doesn't make it easy, particularly if the models in question have foreign key or many-to-many relationships with other models. And if you think about it, it's not really a technical limitation of the sort that could easily be solved with a PR into Django. It's a conceptual problem: if migrations are a historical record of changes to models, and migrations are encapsulated in the same app as those models, then moving a model to a different app necessarily creates a historical coupling between the two apps that shouldn't really exist.

If you're a solo engineer this may not be a problem: just nuke your local database, rebuild the migrations and start again. But as soon as you're working on a team, with each developer having their own dev database, and multiple pre-production and staging environments, these sorts of stop-the-world changes become very difficult to deal with.

And in the early days of a project, these sorts of mistakes happen all the time. In fact, they're not even mistakes at all: they're a natural and positive consequence of the agile, exploratory coding style that makes dynamic languages like Python so productive to work with. Frequent and aggressive refactoring and restructuring should be encouraged as a team iteratively improves its understanding of the requirements and the code gradually approaches an increasingly accurate representation of the real world. Anything that slows down this process is antithetical to the kind of environment that high-performing teams should be striving for.

Essentially, Django makes app boundary decisions into one-way doors: once you walk through, it's prohibitively expensive to walk back. Filling your development process with one-way doors at such an early stage is a huge risk. And, of course, once the app is in production and at scale, complex migrations need to be handled extremely carefully, so the cost of these app boundary "mistakes" is even higher.

Entanglement¶

Developers hate entanglement. Code should be nicely compartmentalised, isolated and encapsulated. Interfaces between components should be well-defined and well-enforced. "Big balls of mud" are bad, "loose coupling" is good.

But, just like with complexity, there are two types of entanglement: essential entanglement, and accidental entanglement. Which one you are suffering from depends on whether you're looking at encapsulating your system components vertically or horizontally. Let me explain.

I would wager that in almost any real-world Django project, the apps that exist within the project's domain (in other words, the non-pluggable apps) have a high degree of interdependency on each other. I've measured this on production apps by mapping out imports. No matter how carefully you draw your domain boundaries, you almost always end up with an import between two apps that, on paper, shouldn't really be related. Over time, in a system under active development, this tends towards infinity: almost every app imports every other app somewhere, for something.

The instinctive Senior Software Architect reaction to this situation is one of repulsion: our boundaries must be wrong, so we need to redraw them and refactor or restructure all the code accordingly. And when, later, we discover our new boundaries were also wrong, we redraw them again. Or we need to introduce new concepts: "ports" or "adapters" or "dependency injection" or "inversion of control".

But what if this entanglement that we're trying to expunge is actually an essential property of the domain we're working in? The real world is deeply entangled and badly factored: business processes and procedures, even where they exist, are often not strictly followed. Yes, employee expenses should all be neatly logged and recorded in a spreadsheet, but sometimes the bookkeeper just needs to walk over to the CEO's desk to ask for the receipt for that Uber ride home from the pub. And this is a good thing: circumventing or side-stepping the "rules" is a natural, practical self-corrective mechanism to deal with imperfections in an otherwise rigid system. It's the 80/20 rule: a good process that works is better than a perfect one that doesn't.

My assertion is that, in many cases, we've been looking at encapsulation all wrong. By attempting to draw perfect boundaries around the objects and logic that make up our systems, we're making them worse as models of the real world. The real world sometimes breaks encapsulation, and so should our systems. Practicality beats purity.

On the other hand, where we do need to be careful with encapsulation is in the parts of our systems that don't represent the real world: the parts that just deal with how computers work. This is accidental entanglement: an SQL string embedded in a Django view, or HTML generation in a model method. This should generally, within reason, be avoided (although, as always, weigh up practicality against purity).

This is where vertical vs horizontal encapsulation comes in. Vertical encapsulation is how Django's apps are traditionally used: to group to together the models, views, forms etc that make up a particular feature or sub-domain within the larger project. This, I would argue, is not as much of a good idea as is usually claimed.

Horizontal encapsulation, on the other hand, is about layering: carefully defining where we put different kinds of business logic within the wider system. Fortunately, Django is already really nicely layered! There's a model layer, and a view layer, and (in some projects) a template layer, and Django is really good at helping us to keep these separate and well-defined. In fact, this is such a great idea that we should lean into it and do more of it. We'll talk more about that later.

Orthogonality¶

The final reason to be sceptical of non-reusable "apps" is closely related to the previous discussion about encapsulation. Not only is vertical encapsulation problematic, but it often actually doesn't make sense given the features of our real-world systems.

Imagine the following: you are building a SaaS product that has a public-facing user interface for your customers to interact with. In addition, it has an admin interface (based on the Django admin, with some customisations) and also a separate public API powered by Django REST framework.

The recommended Django way to incorporate the views contributed by an "app" into your project is to use the django.urls.include to "mount" a sub-tree of paths from the app's urls.py into your root URLconf. But how does this work with the example above? You'd actually want at least three root path prefixes: the HTML-serving views (example.com/widgets/), the API (example.com/api/widgets/) and the admin (example.com/admin/some-custom-admin-view/). Should you have a urls.py, admin_urls.py and api_urls.py inside each app? Or do you break encapsulation and "pull up" all of the URL wiring into the root URLconf?

This situation gets even worse as the product becomes more complex. In many cases, a single system might have multiple "views" onto a single set of models, such as a public (non-logged-in) informational website, a customer admin area, and a bespoke system management UI.

The problem is quite simply stated: models and interfaces are orthogonal concerns. You might have multiple interfaces that interact with the same set of underlying models, or a single interface that interacts with models from across the system. An architectural approach that suggests bundling related models and views together into the same "app" falls apart quickly in any reasonably complex real-world product.

Layers¶

So, if the concept of apps is problematic, what should we do instead?

There are generally three kinds of things that make up a Django app:

Models
Interfaces: usually HTTP-handling views, but management commands fall into this category too.
Business logic: the code that implements the things that the product actually does.

As described above, the standard Django approach is to chop the product up vertically into "apps" along feature boundaries, and then put all three of these kinds of things (models, interfaces, business logic) into each app.

The approach I'm proposing here turns that on its head: keep the three kinds of things separate at the top level. These are the layers of our system. Then if necessary introduce domain-based partitioning within each layer.

One other thing to understand, that will be expanded on later, is that we split our business logic layer into two parts, which we call readers and actions. As the name suggests, readers encapsulate business logic that involves reading data from the database. Actions, on the other hand, encapsulate state changes: anything that creates or updates data in the database, or reaches out to external systems.

Note

Some files and directories below, including tests and __init__.py, have been omitted for clarity. This is just intended to demonstrate the approximate top-level structure. The details of each section will be explored later.

project/
├── actions
│   ├── some_domain.py
├── data
│   ├── migrations
│   │   ├── 0001_initial.py
│   └── models
│       └── some_model.py
├── interfaces
│   ├── management_commands
│   │   └── management
│   │       └── commands
│   │           └── some_management_command.py
│   └── http
│       ├── api
│       │   ├── urls.py
│       │   └── views.py
│       └── urls.py
├── readers
│   └── some_domain.py
├── settings.py
└── wsgi.py

The only modules that need to be added to INSTALLED_APPS are project.data and project.interfaces.management_commands. This is so Django's autodiscovery mechanisms can find and register these components. Your ROOT_URLCONF setting should be set to project.interfaces.http.urls.

You may choose to organise interfaces differently. For example, if your backend project entirely consists of an API (with no HTML-rendering views), you could "flatten" this structure slightly, and pull the api module "up" a level, eliminating the http container.

Summary

Django's "app" convention works well for reusable components but can create problems, especially in larger projects. A better approach uses horizontal layering, separating the codebase into data (models), interfaces (views, management commands), and business logic (readers and actions) at the top level, with any domain-based partitioning occurring within these layers rather than across them.