Leaky abstraction by omission

Sometimes, an abstraction can be leaky because it leaves something out.

Consider the following interface definition. What's wrong with it?

public interface IReservationsRepository
{
    Task Create(int restaurantId, Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(Guid id);
 
    Task Update(Reservation reservation);
 
    Task Delete(Guid id);
}

Perhaps you think that the name is incorrect; that this really isn't an example of the Repository design pattern, as it's described in Patterns of Enterprise Application Architecture. Ironically, of all patterns, it may be the one most affected by semantic diffusion.

That's not what I have in mind, though. There's something else with that interface.

It's not its CRUD design, either. You could consider that a leaky abstraction, since it strongly implies a sort of persistent data store. That's a worthwhile discussion, but not what I have in mind today. There's something else wrong with the interface.

Consistency #

Look closer at the parameters for the various methods. The Create and ReadReservations methods take a restaurantId parameter, but the other three don't.

Why does the ReadReservations method take a restaurantId while ReadReservation doesn't? Why is that parameter required for Create, but not for Update? That doesn't seem consistent.

The reason is that each reservation has an ID (a GUID). Once the reservation exists, you can uniquely identify it to read, update, or delete it.

As the restaurantId parameter suggests, however, this interface is part of a multi-tenant code base. This code base implements an online restaurant reservation system as a REST API. It's an online service where each restaurant is a separate tenant.

While each reservation has a unique ID, the system still needs to associate it with a restaurant. Thus, the Create method must take a restaurantId parameter in order to associate the reservation with a restaurant.

Once the reservation is stored, however, it's possible to uniquely identify it with the ID. The ReadReservation, Update, and Delete methods need only the id to work.

On the other hand, when you're not querying on reservation ID, you'll need to identify the restaurant, as with the ReadReservations methods. If you didn't identify the restaurant in that method, you'd get all reservations in the requested range, from all tenants. That's not what you want. Therefore, you must supply the restaurantId to limit the query.

The interface is inconsistent, but also allows the underlying implementation to leak through.

Implied implementation detail #

If I told you that the implementation of IReservationsRepository is based on a relational database, can you imagine the design? You may want to stop reading and see if you can predict what the database looks like.

The interface strongly implies a design like this:

CREATE TABLE [dbo].[Reservations] (
    [Id]           INT              IDENTITY (1, 1) NOT NULL,
    [At]           DATETIME2 (7)    NOT NULL,
    [Name]         NVARCHAR (50)    NOT NULL,
    [Email]        NVARCHAR (50)    NOT NULL,
    [Quantity]     INT              NOT NULL,
    [PublicId]     UNIQUEIDENTIFIER NOT NULL,
    [RestaurantId] INT              NOT NULL,
    PRIMARY KEY CLUSTERED ([Id] ASC),
    CONSTRAINT [AK_PublicId] UNIQUE NONCLUSTERED ([PublicId] ASC)
);

What I wrote above is even clearer now. You can't create a row in that table without supplying a RestaurantId, since that column has a NOT NULL constraint.

The PublicId column has a UNIQUE constraint, which means that you can uniquely read and manipulate a single row when you have an ID.

Since all reservations are in a single table, any query not based on PublicId should also filter on RestaurantId. If it doesn't, the result set could include reservations from all restaurants.

Other interpretations #

Is the above relational database design the only possible implementation? Perhaps not. You could implement the interface based on a document database as well. It'd be natural to store each reservation as a separate document with a unique ID. Again, once you have the ID, you can directly retrieve and manipulate the document.

Other implementations become harder, though. Imagine, for example, that you want to shard the database design: Each restaurant gets a separate database. Or perhaps, more realistically, you distribute tenants over a handful of databases, perhaps partitioned on physical location, or some other criterion.

With such a design, the ReadReservation, Update, and Delete methods become more inefficient. While you should be able to identify the correct shard if you have a restaurant ID, you don't have that information. Instead, you'll have to attempt the operation on all databases, thereby eliminating most sharding benefits.

In other words, the absence of the restaurantId parameter from some of the methods suggests certain implementation details.

Leak by omission #

I admit that I rarely run into this sort of problem. Usually, a leaky abstraction manifests by a language construct that contains too much information. This is typically an interface or base class that exposes implementation details by either requiring too specific inputs, or by returning data that reveals implementation details.

For a data access abstraction like the above 'repository', this most frequently happens when people design such an interface around an object-relational mapper (ORM). A class like Reservation would then typically carry ORM details around. Perhaps it inherits from an ORM base class, or perhaps (this is very common) it has a parameterless constructor or getters and setters that model the relationships of the database (these are often called navigation properties).

Another common examples of a leaky abstraction might be the presence of Connect and Disconnect methods. The Connect method may even take a connectionString parameter, clearly leaking that some sort of database is involved.

Yet another example is CQS-violating designs where a Create method returns a database ID.

All such leaky abstractions are leaky because they expose or require too much information.

The example in this article, on the contrary, is leaky because of a lack of detail.

Dependency Inversion Principle #

Ironically, I originally arrived at the above design because I followed the Dependency Inversion Principle (DIP). The clients of IReservationsRepository are ASP.NET Controller actions, like this Delete method:

[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public async Task Delete(int restaurantId, string id)
{
    if (Guid.TryParse(id, out var rid))
    {
        var r = await Repository.ReadReservation(rid)
            .ConfigureAwait(false);
        await Repository.Delete(rid).ConfigureAwait(false);
        if (r is { })
            await PostOffice.EmailReservationDeleted(restaurantId, r)
                .ConfigureAwait(false);
    }
}

As Robert C. Martin explains about the Dependency Inversion Principle:

"clients [...] own the abstract interfaces"

Robert C. Martin, APPP, chapter 11

From that principle, it follows that the Delete method decides what IReservationsRepository.Delete looks like. It seems that the Controller action doesn't need to tell the Repository about the restaurantId when calling its Delete method. Supplying the reservation ID (rid) is enough.

There are, however, various problems with the above code. If the DIP suggests that the restaurantId is redundant when calling Repository.Delete, then why is it required when calling PostOffice.EmailReservationDeleted? This seems inconsistent.

Indeed it is.

As I often do, I arrived at the above Delete method via outside-in TDD, but as I observed a decade ago, TDD alone doesn't guarantee good design. Even when following the red-green-refactor checklist, I often fail to spot problems right away.

That's okay. TDD doesn't guarantee perfection, but done well it should set you up so that you can easily make changes.

Possible remedies #

I can think of two ways to address the problem. The simplest solution is to make the interface consistent by adding a restaurantId parameter to all methods:

public interface IReservationsRepository
{
    Task Create(int restaurantId, Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(int restaurantId, Guid id);
 
    Task Update(int restaurantId, Reservation reservation);
 
    Task Delete(int restaurantId, Guid id);
}

This is the simplest solution, and the one that I prefer. In a future article, I'll show how it enabled me to significantly simplify the code base.

For good measure, though, I should also mention the opposite solution. Completely drain the interface of restaurantId parameters:

public interface IReservationsRepository
{
    Task Create(Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(Guid id);
 
    Task Update(Reservation reservation);
 
    Task Delete(Guid id);
}

How can that work in practice? After all, an implementation must have a restaurant ID in order to create a new row in the database.

It's possible to solve that problem by making the restaurantId an implementation detail. You could make it a constructor parameter for the concrete class, but this gives you another problem. Your Composition Root doesn't know the restaurant ID - after all, it's a run-time argument.

In a method like the above Delete Controller action, you'd have to translate the restaurantId run-time argument to an IReservationsRepository instance. There are various ways around that kind of problem, but they typically involve some kind of factory. That'd be yet another interface:

public interface IReservationsRepositoryFactory
{
    IReservationsRepository Create(int restaurantId);
}

That just makes the API more complicated. Factories give Dependency Injection a bad reputation. For that reason, I don't like this second alternative.

Conclusion #

Leaky abstractions usually express themselves as APIs that expose too many details; the implementation details leak through.

In this example, however, a leaky abstraction manifested as a lack of consistency. Some methods require a restaurantId argument, while others don't - because one particular implementation doesn't need that information.

It turned out, though, that when I was trying to simplify the overall code, this API design held me back. Consistently adding restaurantId parameters to all repository methods solved the problem. A future article tells that tale.

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

Comments

Tobias #

Thank you for the article Mark.

I was wondering whether another solution would be including restaurantId as a member of Reservation? That way it’s not needed by the Create method.

That just leaves ReadReservations as the last method that requires a restaurant ID, but one could argue a specialized read method such as this one doesn’t belong on a repository anyway. I personally tend to interpret these kinds of methods on a repository as a code smell on projects of a certain size.

I might just be missing the point of your article, but I would love to hear your thoughts. :)

2021-05-01 8:59 UTC

Mark Seemann #

Tobias, thank you for writing. You raise a good point, and it might be an appropriate way to model the problem. While the thought had already crossed my mind, I must admit that I hadn't given it much thought.

For the individual CRUD operations, I admit that it might be an appropriate design option. You do, however, also point to the ReadReservations method as the odd man out. I applaud the intellectual honesty you exhibit by bring this up yourself, and I don't intend to misuse it by shooting down your idea. The fact that this method is somehow different might be an indication that it doesn't belong as a member of the same interface as the other four methods.

If that's the case, though, then where does it belong? One option would be to define all interfaces with only a single method:

public interface IReservationsDateRangeQuery
{ 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
}

How should we deal with the restaurantId parameter in such an interface? Should it be included, as is the case here, or should we exclude it from the interface definition, like the following?

Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max);

If we choose to exclude the restaurantId parameter from the interface, it'd be consistent with the CRUD interface that you imply. On the other hand, wouldn't it require some sort of factory, as I outlined above?

Conversely, if we decide to keep the restaurantId parameter as part of the interface definition, it seems inconsistent with the design your suggestion implies.

I'm not writing this to shoot down your suggestion. I find it a real conundrum.

I do think, though, that this might be an indication that there's some kind of abstraction that I've failed to make explicit. Some kind of Restaurant or Tenant type seems most likely.

My problem is that I actually do have a Restaurant class in my code base. That one, however, is a Value Object, so I'm loath to add impure methods to it.

For what it's worth, it's deliberation like this that makes software design interesting. You need to balance opposing forces. A good design is one that does so in a stable way. I'm not claiming that the code shown here does that. You've clearly put your finger on a sore spot, which suggests to me that there's more work to be done. Thank you for the inspiring input!

2021-05-02 11:04 UTC

Thibaut #

Hi Mark, thank you for another great article!

I have worked on several small multi-tenant systems and I faced the same problem as you with the repository interface methods and the "tenant id" being mixed.

After several attempts and API iteration, my final design was to use what Steven van Deursen calls The Ambient Composition Model.

The idea is to inject an ITenantContext in the IReservationsRepository implementation and to use it as needed :

					public class ReservationsRepository : IReservationsRepository {
						private readonly ITenantContext _tenantContext;
						
						public ReservationRepository(ITenantContext tenantContext) {
							_tenantContext = tenantContex;
						}
						
						public Task Create(Reservation reservation) {
							var restaurantId = _tenantContext.RestaurantId;
							// ...
						}
					}

In my case the implementation of the ITenantContext was retrieving the tenant from the route of the method. I think it could be the same for resolving the restaurantId.

This solution seems similar to your Factory solution but I'm not totally sure. In any case, like the Factory solution, this solution is heavier that the first you proposed.

Nonetheless I find some elegance in this solution with the tenant being injected by request in the implementation. What do you think? Did you have the same idea with the Factory solution?

2021-05-02 19:20 UTC

Mark Seemann #

Thibaut, thank you for writing. Yes, that's another option. I've done something similar to that in the past.

In a sense, the concept of a tenant seems almost like a cross-cutting concern, so it makes sense to let it fade into the background, so to speak.

The reason I'm not too keen on that any longer is that it seems a bit too 'clever' in most settings. Consider the Delete Controller action shown above. Imagine that we inject restaurantId into all services - not only IReservationsRepository , but also into IPostOffice. The Delete method might look like this, then:

[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public async Task Delete(int restaurantId, string id)
{
    if (Guid.TryParse(id, out var rid))
    {
        var r = await Repository.ReadReservation(rid)
            .ConfigureAwait(false);
        await Repository.Delete(rid).ConfigureAwait(false);
        if (r is { })
            await PostOffice.EmailReservationDeleted(r)
                .ConfigureAwait(false);
    }
}

The restaurantId parameter still has to be present, even though it's unused. This is likely to be puzzling to any developer not intimately familiar with the code base.

It's possible that you can pull some trick with the ASP.NET framework so that the parameter doesn't have to be there, but it'll still be present in the URL, and again, I'm concerned that most developers would be confused about this.

There's also another thing that bothers me about design like this: You can pull the restaurant ID out of the method's routing data, but this implies that you can do the same with the reservation ID. What makes the restaurant ID special, that it ought to be an injected dependency, while the reservation ID isn't?

I'm concerned that the answer to that question might be 'hand-wavy'. And if we can't defend making one ID a dependency and the other not, then we might take this to the logical conclusion and inject all routing values into services. If we do that, the Delete method might now look like this:

[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public async Task Delete(int restaurantId, string id)
{
    if (Repository.IsIdValid)
    {
        var r = await Repository.ReadReservation()
            .ConfigureAwait(false);
        await Repository.Delete().ConfigureAwait(false);
        if (r is { })
            await PostOffice.EmailReservationDeleted(r)
                .ConfigureAwait(false);
    }
}

(I haven't tried to compile this, so there may be syntax errors.)

This seems really odd, although it's possible that we can salvage it by calling the dependency something else than Repository. It's not really a Unit of Work, but seems closer to that sort of design.

I agree that a tenant feels like something that ought to be 'automatically handled', but I wonder whether it's possible to do that without making the code 'too clever'.

2021-05-04 8:26 UTC

Gonzalo Waszczuk #

How would YAGNI come into play here? For instance, imagine your "client" code wasn't the Delete endpoint, but it was another app or endpoint that only had a "Guid reservationId", but not an "int restaurantId". In such case, wouldn't you be forced to add the restaurantId to the client code? What if this client code doesn't have an easy way to obtain such restaurantId? The reservation id is a global identifier, thus it makes sense that some application (be it a service, console, etc) would just get hold of the guid and nothing else, it's universally identifiable after all, it should be able to identify the reservation uniquely. This may require more roundtrips to the database, or forcing another client one-level above to provide the restaurantId (and this may even require politics and management to get in).

Wouldn't YAGNI say that you shouldn't add the restaurantId to the API, since you ain't gonna need it? I.e, you likely won't change your data access implementation or shard the database in a way that would require that additional restaurantId, and even if you did, perhaps the development effort to add the restaurantId would be the same in that future timeline as it would be right now, so it would be the same to make this change now or afterwards (and in such case, wouldn't it make much sense to make the change afterwards, when you actually need it?).

2021-05-09 23:54 UTC

Mark Seemann #

Gonzalo, thank you for writing. The short answer is that I only 'discovered' the leaky abstraction because I did, in fact, need the restaurant ID. As part of creating, modifying, or deleting reservations, I also wanted to send email updates. For example, when updating a reservation, the system should send an email with a subject line like "Your reservation for Nono changed."

This meant that I had to figure out which name to put in the subject line. Given the restaurant ID, this is trivial, but without it, the system would first have to make a 'reverse lookup' to find the restaurant associated with the reservation ID. While it's technically possible, it seems overly complicated given that the restaurantId was already available at the entry point.

It's true that since the reservation ID is a GUID, it's globally unique. This, however, is an implementation detail. The system doesn't allow external client to refer to a reservation exclusively by a GUID. Rather, from the perspective of an external client, the ID of a reservation looks like https://api.example.com/restaurants/90125/reservations/667fd3ca5d6943b993860809adc99ad5?sig=aqwnNXj28J547upBELNf5I6yPIrQ%2BG9DG2QIAlOpgOE%3D. While both restaurant and reservation IDs are visible within that string, a client can't use those IDs. The external reservation ID is the full (conceptually opaque) string.

I agree, though, that YAGNI is relevant in this context, too. If it's any consolation, I didn't change the code 'just in case' - I did, in fact, change it because I realised that I needed the modified design. But I admit that I didn't explicitly state that in this article.

You may also find the above discussion relevant.

2021-05-11 6:56 UTC

Published: Monday, 26 April 2021 15:10:00 UTC

Leaky abstraction by omission by Mark Seemann

Consistency #

Implied implementation detail #

Other interpretations #

Leak by omission #

Dependency Inversion Principle #

Possible remedies #

Conclusion #

Comments

Wish to comment?

Published

Tags