Coalescing DTOs by Mark Seemann
Refactoring to a universal abstraction.
Despite my best efforts, no code base I write is perfect. This is also true for the code base that accompanies Code That Fits in Your Head.
One (among several) warts that has annoyed me for long is this:
[HttpPost("restaurants/{restaurantId}/reservations")] public async Task<ActionResult> Post( int restaurantId, ReservationDto dto) { if (dto is null) throw new ArgumentNullException(nameof(dto)); var id = dto.ParseId() ?? Guid.NewGuid(); Reservation? reservation = dto.Validate(id); if (reservation is null) return new BadRequestResult(); // More code follows...
Passing id
to Validate
annoys me. Why does Validate
need an id
?
When you see it in context, it may makes some sort of sense, but in isolation, it seems arbitrary:
internal Reservation? Validate(Guid id)
Why does the method need an id
? Doesn't ReservationDto
have an Id
?
Abstraction, broken #
Yes, indeed, ReservationDto
has an Id
property:
public string? Id { get; set; }
Then why do callers have to pass an id
argument? Doesn't Validate
use the Id
property? It's almost as though the Validate
method begs you to read the implementing code:
internal Reservation? Validate(Guid id) { if (!DateTime.TryParse(At, out var d)) return null; if (Email is null) return null; if (Quantity < 1) return null; return new Reservation( id, d, new Email(Email), new Name(Name ?? ""), Quantity); }
Indeed, the method doesn't use the Id
property. Reading the code may not be of much help, but at least we learn that id
is passed to the Reservation
constructor. It's still not clear why the method isn't trying to parse the Id
property, like it's doing with At
.
I'll return to the motivation in a moment, but first I'd like to dwell on the problems of this design.
It's a typical example of ad-hoc design. I had a set of behaviours I needed to implement, and in order to avoid code duplication, I came up with a method that seemed to solve the problem.
And indeed, the Validate
method does solve the problem of code duplication. It also passes all tests. It could be worse.
It could also be better.
The problem with an ad-hoc design like this is that the motivation is unclear. As a reader, you feel that you're missing the full picture. Perhaps you feel compelled to read the implementation code to gain a better understanding. Perhaps you look for other call sites. Perhaps you search the Git history to find a helpful comment. Perhaps you ask a colleague.
It slows you down. Worst of all, it may leave you apprehensive of refactoring. If you feel that there's something you don't fully understand, you may decide to leave the API alone, instead of improving it.
It's one of the many ways that code slowly rots.
What's missing here is a proper abstraction.
Motivation #
I recently hit upon a design that I like better. Before I describe it, however, you need to understand the problem I was trying to solve.
The code base for the book is a restaurant reservation REST API, and I was evolving the code as I wrote it. I wanted the code base (and its Git history) to be as realistic as possible. In a real-world situation, you don't always know all requirements up front, or even if you do, they may change.
At one point I decided that a REST client could supply a GUID when making a new reservation. On the other hand, I had lots of existing tests (and a deployed system) that accepted reservations without IDs. In order to not break compatibility, I decided to use the ID if it was supplied with the DTO, and otherwise create one. (I later explored an API without explicit IDs, but that's a different story.)
The id
is a JSON property, however, so there's no guarantee that it's properly formatted. Thus, the need to first parse it:
var id = dto.ParseId() ?? Guid.NewGuid();
To make matters even more complicated, when you PUT
a reservation, the ID is actually part of the resource address, which means that even if it's present in the JSON document, that value should be ignored:
[HttpPut("restaurants/{restaurantId}/reservations/{id}")] public async Task<ActionResult> Put( int restaurantId, string id, ReservationDto dto) { if (dto is null) throw new ArgumentNullException(nameof(dto)); if (!Guid.TryParse(id, out var rid)) return new NotFoundResult(); Reservation? reservation = dto.Validate(rid); if (reservation is null) return new BadRequestResult(); // More code follows...
Notice that this Put
implementation exclusively considers the resource address id
parameter. Recall that the Validate
method ignores the dto
's Id
property.
This is knowledge about implementation details that leaks through to the calling code. As a client developer, you need to know and keep this knowledge in your head while you write your own code. That's not really code that fits in your head.
As I usually put it: If you have to read the code, it implies that encapsulation is broken.
At the time, however, I couldn't think of a better alternative, and since the problem is still fairly small and localised, I decided to move on. After all, perfect is the enemy of good.
Why don't you just..? #
Is there a better way? Perhaps you think that you've spotted an obvious improvement. Why don't I just try to parse dto.Id
and then create a Guid.NewGuid()
if parsing fails? Like this:
internal Reservation? Validate() { if (!Guid.TryParse(Id, out var id)) id = Guid.NewGuid(); if (!DateTime.TryParse(At, out var d)) return null; if (Email is null) return null; if (Quantity < 1) return null; return new Reservation( id, d, new Email(Email), new Name(Name ?? ""), Quantity); }
The short answer is: Because it doesn't work.
It may work for Get
, but then Put
doesn't have a way to tell the Validate
method which ID to use.
Or rather: That's not entirely true, because this is possible:
dto.Id = id;
Reservation? reservation = dto.Validate();
This does suggest an even better way. Before we go there, however, there's another reason I don't like this particular variation: It makes Validate
impure.
Why care? you may ask.
I always end up regretting making an otherwise potentially pure function non-deterministic. Sooner or later, it turns out to have been a bad decision, regardless of how alluring it initially looked. I recently gave an example of that.
When weighing the advantages and disadvantages, I preferred passing id
explicitly rather than relying on Guid.NewGuid()
inside Validate
.
First monoid #
One of the reasons I find universal abstractions beneficial is that you only have to learn them once. As Felienne Hermans writes in The Programmer's Brain our working memory juggles a combination of ephemeral data and knowledge from our long-term memory. The better you can leverage existing knowledge, the easier it is to read code.
Which universal abstraction enables you to choose from a prioritised list of candidates? The First monoid!
In C# with nullable reference types the null-coalescing operator ??
already implements the desired functionality. (If you're using another language or an older version of C#, you can instead use Maybe.)
Once I got that idea I was able to simplify the API.
Parsing and coalescing DTOs #
Instead of that odd Validate
method which isn't quite a validator and not quite a parser, this insight suggests to parse, don't validate:
internal Reservation? TryParse() { if (!Guid.TryParse(Id, out var id)) return null; if (!DateTime.TryParse(At, out var d)) return null; if (Email is null) return null; if (Quantity < 1) return null; return new Reservation( id, d, new Email(Email), new Name(Name ?? ""), Quantity); }
This function only returns a parsed Reservation
object when the Id
is present and well-formed. What about the cases where the Id
is absent?
The calling ReservationsController
can deal with that:
Reservation? candidate1 = dto.TryParse(); dto.Id = Guid.NewGuid().ToString("N"); Reservation? candidate2 = dto.TryParse(); Reservation? reservation = candidate1 ?? candidate2; if (reservation is null) return new BadRequestResult();
First try to parse the dto
, then explicitly overwrite its Id
property with a new Guid
, and then try to parse it again. Finally, pick the first of these that aren't null, using the null-coalescing ??
operator.
This API also works consistently in the Put
method:
dto.Id = id; Reservation? reservation = dto.TryParse(); if (reservation is null) return new BadRequestResult();
Why is this better? I consider it better because the TryParse
function should be a familiar abstraction. Once you've seen a couple of those, you know that a well-behaved parser either returns a valid object, or nothing. You don't have to go and read the implementation of TryParse
to (correctly) guess that. Thus, encapsulation is maintained.
Where does mutation go? #
The ReservationsController
mutates the dto
and relies on the impure Guid.NewGuid()
method. Why is that okay when it wasn't okay to do this inside of Validate
?
This is because the code base follows the functional core, imperative shell architecture. Specifically, Controllers make up the imperative shell, so I consider it appropriate to put impure actions there. After all, they have to go somewhere.
This means that the TryParse
function remains pure.
Conclusion #
Sometimes a good API design can elude you for a long time. When that happens, I move on with the best solution I can think of in the moment. As it often happens, though, ad-hoc abstractions leave me unsatisfied, so I'm always happy to improve such code later, if possible.
In this article, you saw an example of an ad-hoc API design that represented the best I could produce at the time. Later, it dawned on me that an implementation based on a universal abstraction would be possible. In this case, the universal abstraction was null coalescing (which is a specialisation of the monoid abstraction).
I like universal abstractions because, once you know them, you can trust that they work in well-understood ways. You don't have to waste time reading implementation code in order to learn whether it's safe to call a method in a particular way.
This saves time when you have to work with the code, because, after all, we spend more time reading code than writing it.
Comments
After the refactor in this article, is the entirety of your Post method (including the part you didn't show in this article) an impureim sandwich?
Not yet. There's a lot of (preliminary) interleaving of impure actions and pure functions remaining in the controller, even after this refactoring.
A future article will tackle that question. One of the reasons I even started writing about monads, functor relationships, etcetera was to establish the foundations for what this requires. If it can be done without monads and traversals I don't know how.
Even though the
Post
method isn't an impureim sandwich, I still consider the architecture functional core, imperative shell, since I've kept all impure actions in the controllers.The reason I didn't go all the way to impureim sandwiches with the book's code is didactic. For complex logic, you'll need traversals, monads, sum types, and so on, and none of those things were in scope for the book.