Leaky abstraction by omission by Mark Seemann
Sometimes, an abstraction can be leaky because it leaves something out.
Consider the following interface definition. What's wrong with it?
public interface IReservationsRepository { Task Create(int restaurantId, Reservation reservation); Task<IReadOnlyCollection<Reservation>> ReadReservations( int restaurantId, DateTime min, DateTime max); Task<Reservation?> ReadReservation(Guid id); Task Update(Reservation reservation); Task Delete(Guid id); }
Perhaps you think that the name is incorrect; that this really isn't an example of the Repository design pattern, as it's described in Patterns of Enterprise Application Architecture. Ironically, of all patterns, it may be the one most affected by semantic diffusion.
That's not what I have in mind, though. There's something else with that interface.
It's not its CRUD design, either. You could consider that a leaky abstraction, since it strongly implies a sort of persistent data store. That's a worthwhile discussion, but not what I have in mind today. There's something else wrong with the interface.
Consistency #
Look closer at the parameters for the various methods. The Create
and ReadReservations
methods take a restaurantId
parameter, but the other three don't.
Why does the ReadReservations
method take a restaurantId
while ReadReservation
doesn't? Why is that parameter required for Create
, but not for Update
? That doesn't seem consistent.
The reason is that each reservation has an ID (a GUID). Once the reservation exists, you can uniquely identify it to read, update, or delete it.
As the restaurantId
parameter suggests, however, this interface is part of a multi-tenant code base. This code base implements an online restaurant reservation system as a REST API. It's an online service where each restaurant is a separate tenant.
While each reservation has a unique ID, the system still needs to associate it with a restaurant. Thus, the Create
method must take a restaurantId
parameter in order to associate the reservation with a restaurant.
Once the reservation is stored, however, it's possible to uniquely identify it with the ID. The ReadReservation
, Update
, and Delete
methods need only the id
to work.
On the other hand, when you're not querying on reservation ID, you'll need to identify the restaurant, as with the ReadReservations
methods. If you didn't identify the restaurant in that method, you'd get all reservations in the requested range, from all tenants. That's not what you want. Therefore, you must supply the restaurantId
to limit the query.
The interface is inconsistent, but also allows the underlying implementation to leak through.
Implied implementation detail #
If I told you that the implementation of IReservationsRepository
is based on a relational database, can you imagine the design? You may want to stop reading and see if you can predict what the database looks like.
The interface strongly implies a design like this:
CREATE TABLE [dbo].[Reservations] ( [Id] INT IDENTITY (1, 1) NOT NULL, [At] DATETIME2 (7) NOT NULL, [Name] NVARCHAR (50) NOT NULL, [Email] NVARCHAR (50) NOT NULL, [Quantity] INT NOT NULL, [PublicId] UNIQUEIDENTIFIER NOT NULL, [RestaurantId] INT NOT NULL, PRIMARY KEY CLUSTERED ([Id] ASC), CONSTRAINT [AK_PublicId] UNIQUE NONCLUSTERED ([PublicId] ASC) );
What I wrote above is even clearer now. You can't create a row in that table without supplying a RestaurantId
, since that column has a NOT NULL
constraint.
The PublicId
column has a UNIQUE
constraint, which means that you can uniquely read and manipulate a single row when you have an ID.
Since all reservations are in a single table, any query not based on PublicId
should also filter on RestaurantId
. If it doesn't, the result set could include reservations from all restaurants.
Other interpretations #
Is the above relational database design the only possible implementation? Perhaps not. You could implement the interface based on a document database as well. It'd be natural to store each reservation as a separate document with a unique ID. Again, once you have the ID, you can directly retrieve and manipulate the document.
Other implementations become harder, though. Imagine, for example, that you want to shard the database design: Each restaurant gets a separate database. Or perhaps, more realistically, you distribute tenants over a handful of databases, perhaps partitioned on physical location, or some other criterion.
With such a design, the ReadReservation
, Update
, and Delete
methods become more inefficient. While you should be able to identify the correct shard if you have a restaurant ID, you don't have that information. Instead, you'll have to attempt the operation on all databases, thereby eliminating most sharding benefits.
In other words, the absence of the restaurantId
parameter from some of the methods suggests certain implementation details.
Leak by omission #
I admit that I rarely run into this sort of problem. Usually, a leaky abstraction manifests by a language construct that contains too much information. This is typically an interface or base class that exposes implementation details by either requiring too specific inputs, or by returning data that reveals implementation details.
For a data access abstraction like the above 'repository', this most frequently happens when people design such an interface around an object-relational mapper (ORM). A class like Reservation
would then typically carry ORM details around. Perhaps it inherits from an ORM base class, or perhaps (this is very common) it has a parameterless constructor or getters and setters that model the relationships of the database (these are often called navigation properties).
Another common examples of a leaky abstraction might be the presence of Connect
and Disconnect
methods. The Connect
method may even take a connectionString
parameter, clearly leaking that some sort of database is involved.
Yet another example is CQS-violating designs where a Create
method returns a database ID.
All such leaky abstractions are leaky because they expose or require too much information.
The example in this article, on the contrary, is leaky because of a lack of detail.
Dependency Inversion Principle #
Ironically, I originally arrived at the above design because I followed the Dependency Inversion Principle (DIP). The clients of IReservationsRepository
are ASP.NET Controller actions, like this Delete
method:
[HttpDelete("restaurants/{restaurantId}/reservations/{id}")] public async Task Delete(int restaurantId, string id) { if (Guid.TryParse(id, out var rid)) { var r = await Repository.ReadReservation(rid) .ConfigureAwait(false); await Repository.Delete(rid).ConfigureAwait(false); if (r is { }) await PostOffice.EmailReservationDeleted(restaurantId, r) .ConfigureAwait(false); } }
As Robert C. Martin explains about the Dependency Inversion Principle:
From that principle, it follows that the"clients [...] own the abstract interfaces"
Delete
method decides what IReservationsRepository.Delete
looks like. It seems that the Controller action doesn't need to tell the Repository
about the restaurantId
when calling its Delete
method. Supplying the reservation ID (rid
) is enough.
There are, however, various problems with the above code. If the DIP suggests that the restaurantId
is redundant when calling Repository.Delete
, then why is it required when calling PostOffice.EmailReservationDeleted
? This seems inconsistent.
Indeed it is.
As I often do, I arrived at the above Delete
method via outside-in TDD, but as I observed a decade ago, TDD alone doesn't guarantee good design. Even when following the red-green-refactor checklist, I often fail to spot problems right away.
That's okay. TDD doesn't guarantee perfection, but done well it should set you up so that you can easily make changes.
Possible remedies #
I can think of two ways to address the problem. The simplest solution is to make the interface consistent by adding a restaurantId
parameter to all methods:
public interface IReservationsRepository { Task Create(int restaurantId, Reservation reservation); Task<IReadOnlyCollection<Reservation>> ReadReservations( int restaurantId, DateTime min, DateTime max); Task<Reservation?> ReadReservation(int restaurantId, Guid id); Task Update(int restaurantId, Reservation reservation); Task Delete(int restaurantId, Guid id); }
This is the simplest solution, and the one that I prefer. In a future article, I'll show how it enabled me to significantly simplify the code base.
For good measure, though, I should also mention the opposite solution. Completely drain the interface of restaurantId
parameters:
public interface IReservationsRepository { Task Create(Reservation reservation); Task<IReadOnlyCollection<Reservation>> ReadReservations( DateTime min, DateTime max); Task<Reservation?> ReadReservation(Guid id); Task Update(Reservation reservation); Task Delete(Guid id); }
How can that work in practice? After all, an implementation must have a restaurant ID in order to create a new row in the database.
It's possible to solve that problem by making the restaurantId
an implementation detail. You could make it a constructor parameter for the concrete class, but this gives you another problem. Your Composition Root doesn't know the restaurant ID - after all, it's a run-time argument.
In a method like the above Delete
Controller action, you'd have to translate the restaurantId
run-time argument to an IReservationsRepository
instance. There are various ways around that kind of problem, but they typically involve some kind of factory. That'd be yet another interface:
public interface IReservationsRepositoryFactory { IReservationsRepository Create(int restaurantId); }
That just makes the API more complicated. Factories give Dependency Injection a bad reputation. For that reason, I don't like this second alternative.
Conclusion #
Leaky abstractions usually express themselves as APIs that expose too many details; the implementation details leak through.
In this example, however, a leaky abstraction manifested as a lack of consistency. Some methods require a restaurantId
argument, while others don't - because one particular implementation doesn't need that information.
It turned out, though, that when I was trying to simplify the overall code, this API design held me back. Consistently adding restaurantId
parameters to all repository methods solved the problem. A future article tells that tale.
The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.
Comments
Thank you for the article Mark.
I was wondering whether another solution would be including restaurantId as a member of Reservation? That way it’s not needed by the Create method.
That just leaves ReadReservations as the last method that requires a restaurant ID, but one could argue a specialized read method such as this one doesn’t belong on a repository anyway. I personally tend to interpret these kinds of methods on a repository as a code smell on projects of a certain size.
I might just be missing the point of your article, but I would love to hear your thoughts. :)
Tobias, thank you for writing. You raise a good point, and it might be an appropriate way to model the problem. While the thought had already crossed my mind, I must admit that I hadn't given it much thought.
For the individual CRUD operations, I admit that it might be an appropriate design option. You do, however, also point to the
ReadReservations
method as the odd man out. I applaud the intellectual honesty you exhibit by bring this up yourself, and I don't intend to misuse it by shooting down your idea. The fact that this method is somehow different might be an indication that it doesn't belong as a member of the same interface as the other four methods.If that's the case, though, then where does it belong? One option would be to define all interfaces with only a single method:
How should we deal with the
restaurantId
parameter in such an interface? Should it be included, as is the case here, or should we exclude it from the interface definition, like the following?If we choose to exclude the
restaurantId
parameter from the interface, it'd be consistent with the CRUD interface that you imply. On the other hand, wouldn't it require some sort of factory, as I outlined above?Conversely, if we decide to keep the
restaurantId
parameter as part of the interface definition, it seems inconsistent with the design your suggestion implies.I'm not writing this to shoot down your suggestion. I find it a real conundrum.
I do think, though, that this might be an indication that there's some kind of abstraction that I've failed to make explicit. Some kind of Restaurant or Tenant type seems most likely.
My problem is that I actually do have a
Restaurant
class in my code base. That one, however, is a Value Object, so I'm loath to add impure methods to it.For what it's worth, it's deliberation like this that makes software design interesting. You need to balance opposing forces. A good design is one that does so in a stable way. I'm not claiming that the code shown here does that. You've clearly put your finger on a sore spot, which suggests to me that there's more work to be done. Thank you for the inspiring input!
Hi Mark, thank you for another great article!
I have worked on several small multi-tenant systems and I faced the same problem as you with the repository interface methods and the "tenant id" being mixed.
After several attempts and API iteration, my final design was to use what Steven van Deursen calls The Ambient Composition Model.
The idea is to inject an
ITenantContext
in theIReservationsRepository
implementation and to use it as needed :In my case the implementation of the
ITenantContext
was retrieving the tenant from the route of the method. I think it could be the same for resolving therestaurantId
.This solution seems similar to your Factory solution but I'm not totally sure. In any case, like the Factory solution, this solution is heavier that the first you proposed.
Nonetheless I find some elegance in this solution with the tenant being injected by request in the implementation. What do you think? Did you have the same idea with the Factory solution?
Thibaut, thank you for writing. Yes, that's another option. I've done something similar to that in the past.
In a sense, the concept of a tenant seems almost like a cross-cutting concern, so it makes sense to let it fade into the background, so to speak.
The reason I'm not too keen on that any longer is that it seems a bit too 'clever' in most settings. Consider the
Delete
Controller action shown above. Imagine that we injectrestaurantId
into all services - not onlyIReservationsRepository
, but also intoIPostOffice
. TheDelete
method might look like this, then:The
restaurantId
parameter still has to be present, even though it's unused. This is likely to be puzzling to any developer not intimately familiar with the code base.It's possible that you can pull some trick with the ASP.NET framework so that the parameter doesn't have to be there, but it'll still be present in the URL, and again, I'm concerned that most developers would be confused about this.
There's also another thing that bothers me about design like this: You can pull the restaurant ID out of the method's routing data, but this implies that you can do the same with the reservation ID. What makes the restaurant ID special, that it ought to be an injected dependency, while the reservation ID isn't?
I'm concerned that the answer to that question might be 'hand-wavy'. And if we can't defend making one ID a dependency and the other not, then we might take this to the logical conclusion and inject all routing values into services. If we do that, the
Delete
method might now look like this:(I haven't tried to compile this, so there may be syntax errors.)
This seems really odd, although it's possible that we can salvage it by calling the dependency something else than
Repository
. It's not really a Unit of Work, but seems closer to that sort of design.I agree that a tenant feels like something that ought to be 'automatically handled', but I wonder whether it's possible to do that without making the code 'too clever'.
How would YAGNI come into play here? For instance, imagine your "client" code wasn't the Delete endpoint, but it was another app or endpoint that only had a "Guid reservationId", but not an "int restaurantId". In such case, wouldn't you be forced to add the restaurantId to the client code? What if this client code doesn't have an easy way to obtain such restaurantId? The reservation id is a global identifier, thus it makes sense that some application (be it a service, console, etc) would just get hold of the guid and nothing else, it's universally identifiable after all, it should be able to identify the reservation uniquely. This may require more roundtrips to the database, or forcing another client one-level above to provide the restaurantId (and this may even require politics and management to get in).
Wouldn't YAGNI say that you shouldn't add the restaurantId to the API, since you ain't gonna need it? I.e, you likely won't change your data access implementation or shard the database in a way that would require that additional restaurantId, and even if you did, perhaps the development effort to add the restaurantId would be the same in that future timeline as it would be right now, so it would be the same to make this change now or afterwards (and in such case, wouldn't it make much sense to make the change afterwards, when you actually need it?).
Gonzalo, thank you for writing. The short answer is that I only 'discovered' the leaky abstraction because I did, in fact, need the restaurant ID. As part of creating, modifying, or deleting reservations, I also wanted to send email updates. For example, when updating a reservation, the system should send an email with a subject line like "Your reservation for Nono changed."
This meant that I had to figure out which name to put in the subject line. Given the restaurant ID, this is trivial, but without it, the system would first have to make a 'reverse lookup' to find the restaurant associated with the reservation ID. While it's technically possible, it seems overly complicated given that the
restaurantId
was already available at the entry point.It's true that since the reservation ID is a GUID, it's globally unique. This, however, is an implementation detail. The system doesn't allow external client to refer to a reservation exclusively by a GUID. Rather, from the perspective of an external client, the ID of a reservation looks like
https://api.example.com/restaurants/90125/reservations/667fd3ca5d6943b993860809adc99ad5?sig=aqwnNXj28J547upBELNf5I6yPIrQ%2BG9DG2QIAlOpgOE%3D
. While both restaurant and reservation IDs are visible within that string, a client can't use those IDs. The external reservation ID is the full (conceptually opaque) string.I agree, though, that YAGNI is relevant in this context, too. If it's any consolation, I didn't change the code 'just in case' - I did, in fact, change it because I realised that I needed the modified design. But I admit that I didn't explicitly state that in this article.
You may also find the above discussion relevant.