Backend-for-frontend

Background¶

There is a widely followed school of thought that suggests that an application should have exactly one HTTP API surface that all clients should consume. This API is the Platonic ideal representation of your data, and the single source of truth for the shape and structure of every piece of information that enters or leaves your application. Your web application, mobile app, and any external customers of your public API should all use this same set of endpoints. You may have heard of this approach referred to as "dogfooding": if you can't build your own product on top of your API, why should anyone else be able to build their products on top of it?

Of course, as with so many things, this idea rarely survives contact with reality. When it does, there are almost always trade-offs.

Misfetching¶

Overfetching and underfetching (which could perhaps be united under the term misfetching) are names to describe situations where a client is sent data it doesn't need, or isn't sent data that it does need. In the case of overfetching, this manifests as inefficiency: data is retrieved from the database, transformed, serialized, sent over the network, and deserialized only to be thrown away.

In some circumstances, this can be very bad indeed. I've seen firsthand an example where an API featured a "user list" endpoint which was intended to be consumed everywhere a user list was needed in the client application. The representation of each user was quite large, including dozens of attributes, nested relationships and so on: perfect for a user listing page, displayed as a paginated table with many columns. But elsewhere in the frontend codebase, this same endpoint was used to provide data for a dropdown select control, a UI element that only needed two fields from each user representation (the ID and name)! The vast majority of the data, which was expensive to retrieve, serialize, and send over a (possibly slow) network, was being discarded. The obvious result of this was that the dropdown was extremely slow to populate.

GraphQL¶

Misfetching is one of the problems that was supposed to be solved by GraphQL. Rather than defining a fixed set of endpoints, each request to your application instead includes a query written in a proprietary language that the client uses to specify exactly the shape of the response it wants. If the client is rendering a dropdown, it asks for only the ID and user's name. If it's rendering a table of users, it can request as many fields and nested relationships (the "graph" part) as it wants.

Unfortunately, GraphQL introduces its own set of issues. First is the complicated business of efficiently querying the underlying datastore: the server implementation needs to be able to load only the parts of the data graph requested by the client in such a way as to avoid the N+1 query problem, writ large. There's essentially no way to do this optimisation manually (due to the infinitely expressive nature of the query language), which means the backend often needs to attempt automatic optimisation. This often makes the server-side implementation of GraphQL libraries extremely complicated.

Secondly, GraphQL often comes with a set of security concerns. Many applications require a field-level permissions system to prevent low-privilege clients from being able to simply form a query to request values that they shouldn't have access to. In some cases, even the knowledge of the existence of those restricted fields in the database could be a security concern, meaning the GraphQL schema document needs to be different for different types of consumers.

To get around these issues, Facebook (the originator of GraphQL) apparently now follows an approach of maintaining an allowlist of queries on the server that are known to be efficient and carry the correct permissions for the requesting client. Of course, this more or less entirely removes all of the stated benefits of GraphQL!

URL-per-use-case¶

It does provide a germ of an idea, though. GraphQL allows clients to specify their own data shape, for use at the exact point where they need it. But Facebook takes those client queries and stores them on the server, giving each one a unique ID so the client can ask for it at runtime. What if instead of an ID, we gave each query a meaningful string identifier? The identifiers could use a separator, like /, to express hierarchy and... well, are you getting it? These are URLs!

So here's the argument: you should construct your backend endpoints to perfectly meet the needs of your frontend components. That's why I call this approach "backend for frontend"¹. For internal-facing APIs at least (i.e. for endpoints that serve clients you control), your API surface should have one endpoint per client use case. Need a user list page? Build an endpoint to serve the user list page, containing precisely the data needed by that particular frontend component. Need a user picker dropdown? Build another endpoint that returns just the user ID and name. Need a mobile app and a web app? They should talk to completely separate API surfaces, probably with prefixes like /api/mobile/.../ and /api/web/.../. Have a low-privilege "client" UI and a separate high-privilege "admin" UI? Give each one its own API surface. Even if the data is the same in both places right now, it almost certainly won't be forever. If your endpoints are isolated, a change to the data shape returned by one endpoint affects only that endpoint.

Following this approach, your API surface becomes extremely broad (lots of endpoints) but largely shallow (each individual endpoint tends to be quite simple). I can already sense seasoned Django REST framework developers cowering in the face of an explosion of serializers and generic views to cover each of these use cases. This approach only works well if endpoints (views) are extremely lightweight, quick to create and easy to modify, which steers the developer towards a more declarative approach. It also forces shared business logic to be pushed "down a layer" into readers and actions. The best way to build these types of APIs is, in my opinion, to use the approach described in the next section, based on django-readers for read endpoints and inline serializers for action endpoints.

Public APIs¶

And finally, what about public-facing APIs, intended for other developers to integrate their product with yours? Well, those can be built entirely separately from your internal-facing endpoints. They can be beautifully designed to be widely understandable, expose only the concepts you want to expose, remodel your data to paper over implementation details that your customers don't need to care about, include carefully designed permission boundaries, and expose lovingly crafted schemas and documentation to provide a smooth and friendly developer experience. All without affecting your internal-facing endpoints!

Summary

To ensure maintainable, secure, high-performance APIs, backend developers should tailor endpoints to perfectly match frontend use cases. Each client, component and/or user type could be served by its own set of lightweight endpoints, constructed declaratively using shared building blocks in reader and action functions.

Sources¶

The API Churn/Security Trade-off by Carson Gross.

The term "backend for frontend" wasn't coined by me, but is usually used in the context of a microservice architecture to describe a pattern where a separate backend service is created to provide a "facade" to serve each client application. But the concept and benefits described here work in a monolithic codebase too, so I think it's helpful to reuse the term. If it helps, you may choose to call this pattern "monolithic backend-for-frontend". ↩