From State tennis to endomorphism

Monday, 31 May 2021 06:29:00 UTC

You can refactor the State pattern to pure functions.

In a previous article you saw how to do the Tennis kata with the State design pattern. Like most other patterns in Design Patterns, the State pattern relies on mutation. If you favour functional programming and immutable data, you may not like that. Fortunately, converting the API to immutable data and pure functions is plain sailing.

In this post I'll show you how I did it.

Return Score #

Recall from the previous article that the IScore interface defined a single method, BallTo:

public interface IScore
{
    void BallTo(Player winner, Game game);
}

With its void return type, it clearly indicate that BallTo mutates the state of something - although it's less clear whether it's the object itself, game, or both.

As a first step towards turning the method into a pure function, then, you can change the return type so that it returns an IScore object:

public interface IScore
{
    IScore BallTo(Player winner, Game game);
}

In itself, this doesn't guarantee that the function is pure. In fact, after this small step, none of the implementations are. Here, for example, is the updated Advantage implementation:

public IScore BallTo(Player winner, Game game)
{
    if (winner == Player)
        game.Score = new CompletedGame(winner);
    else
        game.Score = Deuce.Instance;
 
    return game.Score;
}

This implementation still modifies game.Score before returning it. All the other IScore implementations do the same.

Use the returned score #

Now that the BallTo method returns an IScore object, you can edit the Game class' BallTo method so that it uses the returned value:

public void BallTo(Player player)
{
    Score = Score.BallTo(player, this);
}

Given that all the IScore implementations currently mutate game.Score, this seems redundant, but sets you up for the next refactoring step.

Remove State mutation #

You can now remove the mutation of game.Score from all the implementations of IScore. Here's Advantage after the refactoring:

public IScore BallTo(Player winner, Game game)
{
    if (winner == Player)
        return new CompletedGame(winner);
    else
        return Deuce.Instance;
}

Notice that this implementation no longer uses the game parameter.

The other IScore implementations get a similar treatment.

Remove game parameter #

Since no implementations use the game parameter you can remove it from the interface:

public interface IScore
{
    IScore BallTo(Player winner);
}

and, of course, from each of the implementations:

public IScore BallTo(Player winner)
{
    if (winner == Player)
        return new CompletedGame(winner);
    else
        return Deuce.Instance;
}

The above method, again, is the implementation of Advantage.

Return Game #

You can now make the same sequence of changes to the Game class itself. Recall from above that its BallTo method returns void. As a the first refactoring step towards turning that method into a pure function, then, change the return type:

public Game BallTo(Player player)
{
    Score = Score.BallTo(player);
    return this;
}

The mutation remains a little while longer, but the method looks like something that could be a pure function.

Return new Game #

The next refactoring step is to return a new Game instance instead of the same (mutated) instance:

public Game BallTo(Player player)
{
    Score = Score.BallTo(player);
    return new Game(Score);
}

The first line still mutates Score, but now you're only one step away from an immutable implementation.

Remove Game mutation #

Finally, you can remove the mutation of the Game class. First, remove the internal setter from the Score property:

public IScore Score { get; }

You can now lean on the compiler, as Michael Feathers explains in Working Effectively with Legacy Code. This forces you to fix the the BallTo method:

public Game BallTo(Player player)
{
    return new Game(Score.BallTo(player));
}

This is also the only refactoring that requires you to edit the unit tests. Here a few methods as examples:

[Theory]
[InlineData(Player.One, Point.Love)]
[InlineData(Player.One, Point.Fifteen)]
[InlineData(Player.One, Point.Thirty)]
[InlineData(Player.Two, Point.Love)]
[InlineData(Player.Two, Point.Fifteen)]
[InlineData(Player.Two, Point.Thirty)]
public void FortyWins(Player winner, Point otherPlayerPoint)
{
    var sut = new Game(new Forty(winner, otherPlayerPoint));
    var actual = sut.BallTo(winner);
    Assert.Equal(new CompletedGame(winner), actual.Score);
}
 
[Theory]
[InlineData(Player.One)]
[InlineData(Player.Two)]
public void FortyThirty(Player player)
{
    var sut = new Game(new Forty(player, Point.Thirty));
    var actual = sut.BallTo(player.Other());
    Assert.Equal(Deuce.Instance, actual.Score);
}

These are the same test methods as shown in the previous article. The changes are: the introduction of the actual variable, and that the assertion now compares the expected value to actual.Score rather than sut.Score.

Both variations of BallTo are now endomorphisms.

Explicit endomorphism #

If you're not convinced that the refactored IScore interface describes an endomorphism, you can make it explicit - strictly for illustrative purposes. First, introduce an explicit IEndomorphism interface:

public interface IEndomorphism<T>
{
    T Run(T x);
}

This is the same interface as already introduced in the article Builder as a monoid. To be clear, I wouldn't use such an interface in normal C# code. I only use it here to illustrate how the BallTo method describes an endomorphism.

You can turn a Player into an endomorphism with an extension method:

public static IEndomorphism<IScore> ToEndomorphism(this Player player)
{
    return new ScoreEndomorphism(player);
}
 
private struct ScoreEndomorphism : IEndomorphism<IScore>
{
    public ScoreEndomorphism(Player player)
    {
        Player = player;
    }
 
    public Player Player { get; }
 
    public IScore Run(IScore score)
    {
        return score.BallTo(Player);
    }
}

This is equivalent to partial function application. It applies the player, and by doing that returns an IEndomorphism<IScore>.

The Game class' BallTo implementation can now Run the endomorphism:

public Game BallTo(Player player)
{
    IEndomorphism<IScore> endo = player.ToEndomorphism();
    IScore newScore = endo.Run(Score);
    return new Game(newScore);
}

Again, I'm not recommending this style of C# programming. I'm only showing this to illustrate how the object playing the State role now describes an endomorphism.

You could subject the Game class' BallTo method to the same treatment, but if you did, you'd have to call the extension method something that would distinguish it from the above ToEndomorphism extension method, since C# doesn't allow overloading exclusively on return type.

Conclusion #

Like many of the other patterns in Design Patterns, the State pattern relies on mutation. It's straightforward, however, to refactor it to a set of pure functions. For what it's worth, these are all endomorphisms.

This article used a take on the tennis kata as an example.


Tennis kata using the State pattern

Monday, 24 May 2021 07:03:00 UTC

An example of using the State design pattern.

Regular readers of this blog will know that I keep coming back to the tennis kata. It's an interesting little problem to attack from various angles.

I don't think you have to do the kata that many times before you realise that you're describing a simple state machine. A few years ago I decided to use that insight to get reacquainted with the State design pattern.

In this article I'll show you what the code looks like.

Context #

As part of the exercise, I decided to stay close to the pattern description in Design Patterns. The public API should be exposed as a single class that hides all the internal state machinery. In the general pattern description, this class is called Context. The TCP example given in the book, however, calls the example class TCPConnection. This indicates that you don't have to use the word context when naming the class. I chose to simply call it Game:

public class Game
{
    public Game() : this(new Points(Point.Love, Point.Love))
    {
    }
 
    public Game(IScore score)
    {
        Score = score;
    }
 
    public IScore Score { getinternal set; }
 
    public void BallTo(Player player)
    {
        Score.BallTo(player, this);
    }
}

Since the Game class delegates all behaviour to its Score property, it's basically redundant. This may be a degenerate example, but as an exercise of staying true to the pattern, I decided to keep it. It's the class that all tests work through.

Test #

All tests look similar. This parametrised test verifies what happens after deuce:

[Theory]
[InlineData(Player.One)]
[InlineData(Player.Two)]
public void ScoreDeuce(Player winner)
{
    var sut = new Game(Deuce.Instance);
    sut.BallTo(winner);
    Assert.Equal(new Advantage(winner), sut.Score);
}

This is code that I wrote years ago, so it uses xUnit.net 2.3.1 and runs on .NET Framework 4.6.1, but I don't think it'd have looked different today. It follows my heuristic for formatting unit tests.

Structural equality #

The equality assertion works because Advantage has structural equality. In this exercise, I found it simpler to declare types as value types instead of overriding Equals and GetHashCode:

public struct Advantage : IScore
{
    public Advantage(Player player)
    {
        Player = player;
    }
 
    public Player Player { getset; }
 
    public void BallTo(Player winner, Game game)
    {
        if (winner == Player)
            game.Score = new CompletedGame(winner);
        else
            game.Score = Deuce.Instance;
    }
}

This turned out to be possible throughout, since all types emerged as mere compositions of other value types. The above Advantage struct, for example, adapts a Player, which, unsurprisingly, is an enum:

public enum Player
{
    One = 0,
    Two
}

One of the states holds no data at all, so I made it a Singleton, as suggested in the book. (Contrary to popular belief, I don't consider Singleton an anti-pattern.)

public struct Deuce : IScore
{
    public readonly static IScore Instance = new Deuce();
 
    public void BallTo(Player winner, Game game)
    {
        game.Score = new Advantage(winner);
    }
}

Since it's a Singleton, from an equality perspective it doesn't matter whether it's a value or reference type, but I made it a struct for consistency's sake.

State #

In the State design pattern's formal structure, the Context delegates all behaviour to an abstract State class. Since I consider inheritance harmful (as well as redundant), I instead chose to model the state as an interface:

public interface IScore
{
    void BallTo(Player winner, Game game);
}

As the pattern suggests, the State object exposes methods that take the Context as an extra parameter. This enables concrete State implementation to change the state of the Context, as both the above structs (Advantage and Deuce) demonstrate. They both implement the interface.

When I do the kata, I always seem to arrive at five distinct states. I'm not sure if this reflects the underlying properties of the problem, or if it's just because that's what worked for me years ago, and I'm now stuck in some cognitive local maximum. In any case, that's what happened here as well. Apart from the above Advantage and Deuce there's also a Forty implementation:

public struct Forty : IScore
{
    public Forty(Player player, Point otherPlayerPoint)
    {
        Player = player;
        OtherPlayerPoint = otherPlayerPoint;
    }
 
    public Player Player { get; }
    public Point OtherPlayerPoint { get; }
 
    public void BallTo(Player winner, Game game)
    {
        if (Player == winner)
            game.Score = new CompletedGame(winner);
        else if (OtherPlayerPoint == Point.Thirty)
            game.Score = Deuce.Instance;
        else if (OtherPlayerPoint == Point.Fifteen)
            game.Score = new Forty(Player, Point.Thirty);
        else
            game.Score = new Forty(Player, Point.Fifteen);
 
    }
}

Another thing that I've also noticed when doing the Tennis kata is that the state logic for advantage and deuce is simple, whereas the state transitions involving points is more complicated. If you think Forty looks complicated, then consider Points:

public struct Points : IScore
{
    public Points(Point playerOnePoint, Point playerTwoPoint)
    {
        PlayerOnePoint = playerOnePoint;
        PlayerTwoPoint = playerTwoPoint;
    }
 
    public Point PlayerOnePoint { get; }
    public Point PlayerTwoPoint { get; }
 
    public void BallTo(Player winner, Game game)
    {
        var pp = PlayerPoint(winner);
        var opp = PlayerPoint(winner.Other());
 
        if (pp == Point.Thirty)
            game.Score = new Forty(winner, opp);
        else if (winner == Player.One)
            game.Score = new Points(Increment(PlayerOnePoint), PlayerTwoPoint);
        else
            game.Score = new Points(PlayerOnePoint, Increment(PlayerTwoPoint));
    }
 
    private Point PlayerPoint(Player player)
    {
        if (player == Player.One)
            return PlayerOnePoint;
        else
            return PlayerTwoPoint;
    }
 
    private static Point Increment(Point point)
    {
        if (point == Point.Love)
            return Point.Fifteen;
        else
            return Point.Thirty;
    }
}

The last IScore implementation represents a completed game:

public struct CompletedGame : IScore
{
    public CompletedGame(Player player)
    {
        Player = player;
    }
 
    public Player Player { get; }
 
    public void BallTo(Player winner, Game game)
    {
    }
}

In a completed game, the BallTo implementation is a no-op, because Player has already won the game.

Miscellany #

Here's a few more tests, just to back up my claim that all tests look similar:

[Theory]
[InlineData(Player.One, Point.Love)]
[InlineData(Player.One, Point.Fifteen)]
[InlineData(Player.One, Point.Thirty)]
[InlineData(Player.Two, Point.Love)]
[InlineData(Player.Two, Point.Fifteen)]
[InlineData(Player.Two, Point.Thirty)]
public void FortyWins(Player winner, Point otherPlayerPoint)
{
    var sut = new Game(new Forty(winner, otherPlayerPoint));
    sut.BallTo(winner);
    Assert.Equal(new CompletedGame(winner), sut.Score);
}
 
[Theory]
[InlineData(Player.One)]
[InlineData(Player.Two)]
public void FortyThirty(Player player)
{
    var sut = new Game(new Forty(player, Point.Thirty));
    sut.BallTo(player.Other());
    Assert.Equal(Deuce.Instance, sut.Score);
}

The second of these test methods uses an extension method called Other:

public static class PlayerEnvy
{
    public static Player Other(this Player player)
    {
        if (player == Player.One)
            return Player.Two;
        else
            return Player.One;
    }
}

As is my custom, I named the class containing the extension method with the Envy suffix, because I often consider this kind of extension method a sign of Feature Envy. Alas, in C# you can't add methods to an enum.

Conclusion #

Implementing the tennis kata with the classic State pattern is straightforward.

After having spent the majority of the last decade with functional programming, I've come to realise that many problems are really just state machines waiting to be revealed as such. Implementing a finite state machine in a language with algebraic data types is so easy that you often reach for that kind of modelling.

Before I learned functional programming, when all I knew was procedural and object-oriented code, I rarely thought of problems in terms of finite state machines. Now I see them everywhere. It's an example of how learning a completely different thing can feed back on everyday programming.

Once you recognise that a problem can be modelled as a finite state machine, you have new options. If you're in a conservative context where colleagues aren't keen on fancy FP shenanigans, you can always reach for the State design pattern.


Comments

Do you think that perhaps you are at risk of making too many problems look like nails for your state machine hammer? :-) Actually, you just want to convert a pair of points into a tennis score. That doesn't require a state machine, I don't think:

using NUnit.Framework;

namespace TennisKata
{
    public class Tests
    {
        private TennisGame tennisGame;

        [SetUp]
        public void Setup()
        {
            tennisGame = new TennisGame();
        }

        [TestCase(0, 0, ExpectedResult = "Love All")]
        [TestCase(1, 1, ExpectedResult = "Fifteen All")]
        [TestCase(2, 2, ExpectedResult = "Thirty All")]
        [TestCase(3, 3, ExpectedResult = "Deuce")]
        [TestCase(4, 4, ExpectedResult = "Deuce")]
        [TestCase(1, 0, ExpectedResult = "Fifteen - Love")]
        [TestCase(2, 1, ExpectedResult = "Thirty - Fifteen")]
        [TestCase(3, 2, ExpectedResult = "Forty - Thirty")]
        [TestCase(4, 0, ExpectedResult = "Game Server")]
        [TestCase(0, 1, ExpectedResult = "Love - Fifteen")]
        [TestCase(1, 2, ExpectedResult = "Fifteen - Thirty")]
        [TestCase(2, 3, ExpectedResult = "Thirty - Forty")]
        [TestCase(0, 4, ExpectedResult = "Game Receiver")]
        [TestCase(4, 3, ExpectedResult = "Advantage Server")]
        [TestCase(3, 4, ExpectedResult = "Advantage Receiver")]
        [TestCase(5, 4, ExpectedResult = "Advantage Server")]
        [TestCase(4, 5, ExpectedResult = "Advantage Receiver")]
        [TestCase(5, 3, ExpectedResult = "Game Server")]
        [TestCase(3, 5, ExpectedResult = "Game Receiver")]
        [TestCase(5, 2, ExpectedResult = "Invalid score")]
        [TestCase(2, 5, ExpectedResult = "Invalid score")]
        public string ShouldConvertPointsToTennisStyleScore(int serverPoints, int receiverPoints)
        {
            SetServerPointsTo(serverPoints);
            SetReceiverPointsTo(receiverPoints);

            return tennisGame.Score;
        }

        private void SetServerPointsTo(int serverPoints)
        {
            for (var i = 0; i < serverPoints; i++)
            {
                tennisGame.PointToServer();
            }
        }

        private void SetReceiverPointsTo(int serverPoints)
        {
            for (var i = 0; i < serverPoints; i++)
            {
                tennisGame.PointToReceiver();
            }
        }
    }

    public class TennisGame
    {
        private int serverPoints;
        private int receiverPoints;

        public string Score => serverPoints switch
        {
            _ when serverPoints == receiverPoints && serverPoints >= 3 => "Deuce",
            _ when serverPoints == receiverPoints => $"{PointsAsWord(serverPoints)} All",
            _ when serverPoints >= 4 && serverPoints > receiverPoints  => GetGameOrAdvantage(serverPoints, receiverPoints, "Server"),
            _ when receiverPoints >= 4 => GetGameOrAdvantage(receiverPoints, serverPoints, "Receiver"),
            _ => $"{PointsAsWord(serverPoints)} - {PointsAsWord(receiverPoints)}"
        };

        public void PointToServer()
        {
            serverPoints++;
        }

        public void PointToReceiver()
        {
            receiverPoints++;
        }

        private static string GetGameOrAdvantage(int highScore, int lowScore, string highScorerName)
        {
            var scoreDifference = highScore - lowScore;

            return scoreDifference switch
            {
                1 => $"Advantage {highScorerName}",
                _ when highScore > 4 && scoreDifference > 2 => "Invalid score",
                _ => $"Game {highScorerName}"
            };
        }

        private string PointsAsWord(int points)
        {
            var pointNames = new [] { "Love",  "Fifteen", "Thirty", "Forty"};

            return pointNames[points];
        }
    }
}

2021-05-27 7:56 UTC

Jim, thank you for writing. You're right: a state machine isn't required. It's a nice judo trick to keep track of the server and receiver points as two different numbers. That does simplify the code.

I tried something similar many years ago (after all, the kata description itself strongly hints at that alternative perspective), but for various reasons ended with an implementation that wasn't as nice as yours. I never published it. I've done this exercise many times, and I've only published the ones that I find can be used to highlight some interesting point.

The point of doing a coding kata is to experiment with variation. The goal isn't always to reach the fewest lines of code, or complete the exercise as fast as possible. These can be interesting exercises in their own rights, but by doing a kata with other constraints can be illuminating as well.

My goal with this variation was mainly to get reacquainted with the State pattern. Actually 'solving' the problem is a requirement, but not the goal.

Modelling the problem with the State pattern has advantages and disadvantages. A major advantage is that it offers an API that enables client code to programmatically distinguish between the various states. When I did the exercise similar to your code, asserting against a string is easy. However, basing an API on a returned string may not be an adequate design. It's okay for an exercise, but imagine that you were asked to translate the scores. For example, in Danish, advantage is instead called fordel. Another requirement might be that you report players by name. So, for example, a Danish score might instead require something like fordel Serena Williams.

Don't take this as a criticism of your code. Sometimes, you don't need more than what you've done, and in such cases, doing more would be over-engineering.

On the other hand, if you find yourself in situations where e.g. translation is required, it can be helpful to be aware that other ways to model a problem are available. That's the underlying benefit of doing katas. The more variations you do, the better prepared you become to 'choose the right tool for the job.'

All that said, though, with the tennis kata, you can make it trivially simple modelling it as a finite state automaton.

2021-05-30 9:09 UTC

Against consistency

Monday, 17 May 2021 06:34:00 UTC

A one-sided argument against imposing a uniform coding style.

I want to credit Nat Pryce for planting the seed for the following line of thinking at GOTO Copenhagen 2012. I'd also like to relieve him of any responsibility for what follows. The blame is all mine.

I'd also like to point out that I'm not categorically against consistency in code. There are plenty of good arguments for having a consistent coding style, but as regular readers may have observed, I have a contrarian streak to my personality. If you're only aware of one side of an argument, I believe that you're unequipped to make informed decisions. Thus, I make the following case against imposing coding styles, not because I'm dead-set opposed to consistent code, but because I believe you should be aware of the disadvantages.

TL;DR #

In this essay, I use the term coding style to indicate a set of rules that governs how code should be formatted. This may include rules about where you put brackets, whether to use tabs or spaces, which naming conventions to use, maximum line width, in C# whether you should use the var keyword or explicit variable declaration, and so on.

As already stated, I can appreciate consistency in code as much as the next programmer. I've seen more than one code base, however, where a formal coding style contributed to ossification.

I've consulted a few development organisations with an eye to improving processes and code quality. Sometimes my suggestions are met with hesitation. When I investigate what causes developers to resist, it turns out that my advice goes against 'how things are done around here.' It might even go against the company's formal coding style guidelines.

Coding styles may impede progress.

Below, I'll discuss a few examples.

Class fields #

A typical example of a coding style regulates naming of class fields. While it seems to be on retreat now, at one time many C# developers would name class fields with a leading underscore:

private readonly string? _action;
private readonly string? _controller;
private readonly object? _values;

I never liked that naming convention because it meant that I always had to type an underscore and then at least one other letter before I could make good use of my IDE. For example, in order to take advantage of auto-completion when using the _action field, I'd have to type _a, instead of just a.

Screen shot of Intellisense drop-down after typing a single underscore.

Yes, I know that typing isn't a bottleneck, but it still annoyed me because it seemed redundant.

A variation of this coding style is to mandate an m_ prefix, which only exacerbates the problem.

Many years ago, I came across a 'solution': Put the underscore after the field name, instead of in front of it:

private readonly string? action_;
private readonly string? controller_;
private readonly object? values_;

Problem solved - I thought for some years.

Then someone pointed out to me that if distinguishing a class field from a local variable is the goal, you can use the this qualifier. That made sense to me. Why invent some controversial naming rule when you can use a language keyword instead?

So, for years, I'd always interact with class fields like this.action, this.controller, and so on.

Then someone else point out to me that this ostensible need to be able to distinguish class fields from local variables, or static from instance fields, was really a symptom of either poor naming or too big classes. While that hurt a bit, I couldn't really defend against the argument.

This is all many years ago. These days, I name class fields like I name variables, and I don't qualify access.

The point of this little story is to highlight how you can institute a naming convention with the best of intentions. As experience accumulates, however, you may realise that you've become wiser. Perhaps that naming convention wasn't such a good idea after all.

When that happens, change the convention. Don't worry that this is going to make the code base inconsistent. An improvement is an improvement, while consistency might only imply that the code base is consistently bad.

Explicit variable declaration versus var #

In late 2007, more than a decade ago, C# 3 introduced the var keyword to the language. This tells the compiler to automatically infer the static type of a variable. Before that, you'd have to explicitly declare the type of all variables:

string href = new UrlBuilder()
    .WithAction(nameof(CalendarController.Get))
    .WithController(nameof(CalendarController))
    .WithValues(new { year = DateTime.Now.Year })
    .BuildAbsolute(Url);

In the above example, the variable href is explicitly declared as a string. With the var keyword you can alternatively write the expression like this:

var href = new UrlBuilder()
    .WithAction(nameof(CalendarController.Get))
    .WithController(nameof(CalendarController))
    .WithValues(new { year = DateTime.Now.Year })
    .BuildAbsolute(Url);

The href variable is still statically typed as a string. The compiler figures that out for you, in this case because the BuildAbsolute method returns a string:

public string BuildAbsolute(IUrlHelper url)

These two alternatives are interchangeable. They compile to the same IL code.

When C# introduced this language feature, a year-long controversy erupted. Opponents felt that using var made code less readable. This isn't an entirely unreasonable argument, but most C# programmers subsequently figured that the advantages of using var outweigh the disadvantages.

A major advantage is that using var better facilitates refactoring. Sometimes, for example, you decide to change the return type of a method. What happens if you change the return type of UrlBuilder?

public Uri BuildAbsolute(IUrlHelper url)

If you've used the var keyword, the compiler just infers a different type. If, on the other hand, you've explicitly declared href as a string, that piece of code no longer compiles.

Using the var keyword makes refactoring easier. You'll still need to edit some call sites when you make a change like this, because Uri affords a different API than string. The point, however, is that when you use var, the cost of making a change is lower. Less ceremony means that you can spend your effort where it matters.

In the context of coding styles, I still, more than a decade after the var keyword was introduced, encounter code bases that use explicit variable declaration.

When I explain the advantages of using the var keyword to the team responsible for the code base, they may agree in principle, but still oppose using it in practice. The reason? Using var would make the code base inconsistent.

Aiming for a consistent coding style is fine, but only as long as it doesn't prohibit improvements. Don't let it stand in the way of progress.

Habitability #

I don't mind consistency; in fact, I find it quite attractive. It must not, however, become a barrier to improvement.

I've met programmers who so strongly favour consistency that they feel that, in order to change coding style, they'd have to go through the entire code base and retroactively update it all to fit the new rule. This is obviously prohibitively expensive to do, so practically it prevents change.

Consistency is fine, but learn to accept inconsistency. As Nat Pryce said, we should learn to love the mess, to adopt a philosophy akin to wabi-sabi.

I think this view on inconsistent code helped me come to grips with my own desire for neatness. An inconsistent code base looks inhabited. I don't mind looking around in a code base and immediately being able to tell: oh, Anna wrote this, or Nader is the author of this method.

What's more important is that the code is comprehensible.

Conclusion #

Consistency in code isn't a bad thing. Coding styles can help encourage a degree of consistency. I think that's fine.

On the other hand, consistency shouldn't be the highest goal of a code base. If improving the code makes a code base inconsistent, I think that the improvement should trump consistency every time.

Let the old code be as it is, until you need to work with it. When you do, you can apply Robert C. Martin's boy scout rule: Always leave the code cleaner than you found it. Code perfection is like eventual consistency; it's something that you should constantly move towards, yet may never attain.

Learn to appreciate the 'lived-in' quality of an active code base.


Simplifying code with Decorated Commands

Monday, 10 May 2021 05:37:00 UTC

Consider modelling many side effects as a single Command.

In a previous article I discussed how an abstraction can sometimes be leaky by omission. In this article, you'll see how removing the leak enables some beneficial refactoring. I'm going to assume that you've read the previous article.

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

The relative cost of the four CRUD operations #

In this article, you'll see code that implements an ASP.NET Controller action. It enables a REST client to update an existing reservation via a PUT request.

I chose to show you the Put method because it's the worst, and thereby the one where refactoring is most warranted. This seems to follow a pattern I've noticed over the years: data updates are always the worst.

Before I show you the code, I'd like to take a little detour to discuss this observation.

Consider the four CRUD operations. Which one is the easiest to implement and maintain, and which one gives you most grief?

Deletes are typically straightforward: A unique identifier is all it takes. The only small complication you may have to consider is idempotence. If you delete an entity once, it's gone. What should happen if you 'delete' it again? To be clear, I don't consider this a trick question. A delete operation should be idempotent, but sometimes, depending on the underlying storage technology, you may have to write a few lines of code to make that happen.

Reads are a little more complicated. I'm actually not sure if reads are more or less involved than create operations. The complexity is probably about the same. Reading a single document from a document database is easy, as is reading a single row from a database. Relational databases can make this a bit harder when you have to join tables, but when you get the hang of it, it's not that hard.

Create operations tend to be about as easy or difficult as reads. Adding a new document to a document database or BLOB storage is easy. Adding a complex entity with foreign key relationships in a relational database is a bit more complicated, but still doable.

Updates, though, are evil. In a document database, it may be easy enough if you can just replace the document wholesale. Often, however, updates involves delta detection. Particularly in databases, when foreign keys are involved, you may have to recursively track down all the related rows and either update those as well, or delete and recreate them.

As you'll see in the upcoming code example, an update typically also involves complicated auxiliary logic to determine what changed, and how to react to it.

For that reason, if possible, I prefer modelling data without supporting updates. Create/read/delete is fine, but if you don't support updates, you may not need deletes either. There's a reason I like Event Sourcing.

A complicated Put method #

My restaurant reservation API included this method that enabled REST clients to update reservations:

[HttpPut("restaurants/{restaurantId}/reservations/{id}")]
public async Task<ActionResult> Put(
    int restaurantId,
    string id,
    ReservationDto dto)
{
    if (dto is null)
        throw new ArgumentNullException(nameof(dto));
    if (!Guid.TryParse(id, out var rid))
        return new NotFoundResult();
 
    Reservation? reservation = dto.Validate(rid);
    if (reservation is null)
        return new BadRequestResult();
 
    var restaurant = await RestaurantDatabase
        .GetRestaurant(restaurantId).ConfigureAwait(false);
    if (restaurant is null)
        return new NotFoundResult();
 
    return
        await TryUpdate(restaurant, reservation).ConfigureAwait(false);
}

Had I written this code exclusively for myself, I'd written in a more functional style, as an impureim sandwich. (Actually, had I written this code exclusively for myself, I'd written it in F# or Haskell.) This code, however, is written for another audience, so I didn't want to assume that the reader knows about impureim sandwiches.

I still wanted to decompose the functionality into small blocks. There's still an echo of the impureim sandwich architecture in the Put method, because it handles most of the impure preparation - the top of the sandwich, so to speak.

The rest - any functional core there might be, as well as impure post-processing - it delegates to the TryUpdate method.

TryUpdate #

Here's the TryUpdate method:

private async Task<ActionResult> TryUpdate(
    Restaurant restaurant,
    Reservation reservation)
{
    using var scope = new TransactionScope(
        TransactionScopeAsyncFlowOption.Enabled);
 
    var existing = await Repository.ReadReservation(reservation.Id)
        .ConfigureAwait(false);
    if (existing is null)
        return new NotFoundResult();
 
    var ok = await WillAcceptUpdate(restaurant, reservation)
        .ConfigureAwait(false);
    if (!ok)
        return NoTables500InternalServerError();
 
    await Update(restaurant, reservation, existing)
        .ConfigureAwait(false);
 
    scope.Complete();
 
    return new OkObjectResult(reservation.ToDto());
}

To be honest, this is mostly just more impure pre-processing. The functional core is hidden away inside the (impure) WillAcceptUpdate method, but I'm not going to show you that one. It's not important in this context.

If, however, the method decides that the update is possible, it'll make one more delegation, to the Update method.

I admit it: This isn't the prettiest code I've ever written. I did warn you, though. I chose this method as an example because it could really do with some refactoring. One problem I have with it is the naming. You have a Put method, which calls a TryUpdate method, which again calls an Update method.

Even though the Try prefix is a .NET idiom, I still feel that a regular reader could be easily confused, having to choose between TryUpdate and Update.

Still, let's soldier on and review the Update method as well. It's the last one, I promise.

Update #

Here's the Update method:

private async Task Update(
    Restaurant restaurant,
    Reservation reservation,
    Reservation existing)
{
    if (existing.Email != reservation.Email)
        await PostOffice
            .EmailReservationUpdating(restaurant.Id, existing)
            .ConfigureAwait(false);
    await Repository.Update(reservation).ConfigureAwait(false);
    await PostOffice
        .EmailReservationUpdated(restaurant.Id, reservation)
        .ConfigureAwait(false);
}

The method perfectly illustrates what I meant when I wrote that you often have to do various kinds of delta analysis when implementing an update - even if delta analysis isn't required by the data store.

This method does two things:

  • It sends emails
  • It updates the repository
Notice that if the email address changes, Update sends an email to the old address. This is an example of delta analysis. This only happens on a changing email address. It doesn't happen if the name or quantity changes.

The motivation is that it may serve to warn the user if someone tries to change the reservation. Only when the email address changes is it necessary to send an email to the old address.

In all cases, the method sends an email to the 'current' address.

This seems ripe for refactoring.

Plugging the leak #

The Update method is an asynchronous Command. It exclusively produces side effects, but it doesn't return anything (we'll regard Task as 'asynchronous unit').

I've known since 2011 that Commands are composable. Later, I also figured out the fundamental reason for that.

The Update method composes three other Commands - one conditional and two unconditional. This seems to call for some sort of composition: Chain of Responsibility, Decorator, or Composite. Common to these patterns, however, is that the object that they compose must share an API. In a language like C# it means that they must share a polymorphic type.

Which type might that be? Let's list the three method signatures in action, one after the other:

  • Task EmailReservationUpdating(int restaurantId, Reservation reservation)
  • Task Update(Reservation reservation)
  • Task EmailReservationUpdated(int restaurantId, Reservation reservation)
Do these three methods have anything in common?

The commonality might be easier to spot if we X out the names (which are only skin-deep, anyway):

  • Task Xxx(int restaurantId, Reservation reservation)
  • Task Xxx(                  Reservation reservation)
  • Task Xxx(int restaurantId, Reservation reservation)
They almost look like each other!

The only deviation is that the middle method (originally the Update method) lacks a restaurantId parameter.

As the previous article explained, though, this is a leaky abstraction by omission. Will plugging the leak enable a refactoring?

Let's try. Make restaurantId a parameter for all methods defined by the interface:

public interface IReservationsRepository
{
    Task Create(int restaurantId, Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(int restaurantId, Guid id);
 
    Task Update(int restaurantId, Reservation reservation);
 
    Task Delete(int restaurantId, Guid id);
}

This is the suggested remedy from the previous article, so I put it here solely as a reminder.

An emailing Decorator #

There's a sequence to the actions in the Update method:

  1. It emails the old address about a changing address
  2. It updates the reservation
  3. It emails the current address about the update
It's easiest to preserve this order of actions if you implement a Decorator around the new version of IReservationsRepository:

public class EmailingReservationsRepository : IReservationsRepository
{
    public EmailingReservationsRepository(
        IPostOffice postOffice,
        IReservationsRepository inner)
    {
        PostOffice = postOffice;
        Inner = inner;
    }
 
    public IPostOffice PostOffice { get; }
    public IReservationsRepository Inner { get; }
 
    public async Task Update(int restaurantId, Reservation reservation)
    {
        if (reservation is null)
            throw new ArgumentNullException(nameof(reservation));
 
        var existing =
            await Inner.ReadReservation(restaurantId, reservation.Id)
                .ConfigureAwait(false);
        if (existing is { } && existing.Email != reservation.Email)
            await PostOffice
                .EmailReservationUpdating(restaurantId, existing)
                .ConfigureAwait(false);
 
        await Inner.Update(restaurantId, reservation)
            .ConfigureAwait(false);
 
        await PostOffice.EmailReservationUpdated(restaurantId, reservation)
            .ConfigureAwait(false);
    }
 
    // Other members go here...
}

You may think that it seems odd to have a 'repository' that also sends emails. I think that this is mostly an artefact of unfortunate naming. Perhaps a follow-up change should be to rename both the interface and the Controller's Repository property. I'm open to suggestions, but for now, I'll leave the names as they are.

If you're still not convinced, consider an alternative architecture based on asynchronous message passing (e.g. CQRS). In such architectures, you'd put Commands on a message bus and think no more of it. A background process would then asynchronously perform all the actions, including sending emails and updating the data store. I think that people used to that kind of architecture wouldn't bat an eyelid by bus.Send(new UpdateReservation(/**/)).

This would also be close to the kind of design that Steven van Deursen and I describe in chapter 10 of our book.

Simplification #

This greatly simplifies things. The above Update method now becomes redundant and can be deleted. Instead, TryUpdate can now directly call Repository.Update:

private async Task<ActionResult> TryUpdate(
    Restaurant restaurant, Reservation reservation)
{
    using var scope = new TransactionScope(
        TransactionScopeAsyncFlowOption.Enabled);
 
    var existing = await Repository
        .ReadReservation(restaurant.Id, reservation.Id)
        .ConfigureAwait(false);
    if (existing is null)
        return new NotFoundResult();
 
    var ok = await WillAcceptUpdate(restaurant, reservation)
        .ConfigureAwait(false);
    if (!ok)
        return NoTables500InternalServerError();
 
    await Repository.Update(restaurant.Id, reservation)
        .ConfigureAwait(false);
 
    scope.Complete();
 
    return new OkObjectResult(reservation.ToDto());
}

This also means that you can remove the PostOffice dependency from the Controller. Lots of things becomes simpler by this refactoring. It better separates concerns, so tests become simpler as well.

Conclusion #

You can simplify code by composing Commands. Candidate patterns for this are Chain of Responsibility, Decorator, and Composite. These patterns, however, require a common polymorphic type. Key to refactoring to these patterns is to identify such a common interface. In this article, I used the refactored IReservationsRepository interface.

Whenever a client calls a method on the repository, a change of state now automatically also sends emails. The client doesn't have to worry about that.

Consider modelling many related side-effects as a single composed Command.


Structural equality for better tests

Monday, 03 May 2021 05:45:00 UTC

A Fluent Builder as a Value Object?

If you've read a bit about unit testing, test-driven development, or other kinds of developer testing, you've probably come across a phrase like this:

Test behaviour, not implementation.
It's often taken to mean something like behaviour-driven development (BDD), and that's certainly one interpretation. I've no problem with that. My own Pluralsight course Outside-In Test-Driven Development shows a similar technique.

It'd be a logical fallacy, however, to thereby conclude that you can only apply that ideal in the large, but not in the small. That it's only possible to do it with coarse-grained tests at the boundary of the system, but not with unit testing.

It may be harder to do at the unit level, since when writing unit tests, you're closer to the implementation, so to speak. Writing the test before the implementation may, however, help

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.

An example test #

Here's a test (using xUnit.net 2.4.1) I wrote before the implementation:

[Theory]
[InlineData("Home")]
[InlineData("Calendar")]
[InlineData("Reservations")]
public void WithControllerHandlesSuffix(string name)
{
    var sut = new UrlBuilder();
 
    var actual = sut.WithController(name + "Controller");
 
    var expected = sut.WithController(name);
    Assert.Equal(expected, actual);
}

It tests an ASP.NET Core URL Builder; particular how it deals with the Controller suffix issue I ran into last year.

Do you notice something odd about this test?

It describes an equality relation between two individual projections of an initial UrlBuilder object (sut).

First of all, with a Mutable Fluent Builder the test would produce a false negative because aliasing would make the assertion a tautological assertion. Using an Immutable Fluent Builder, however, elegantly dodges that bullet: expected and actual are two separate objects.

Yet, it's possible to compare them. How?

Assertions #

I think that most people would have written the above test like this:

[Theory]
[InlineData("Home")]
[InlineData("Calendar")]
[InlineData("Reservations")]
public void WithControllerHandlesSuffix(string name)
{
    var sut = new UrlBuilder();
 
    var actual = sut.WithController(name + "Controller");
 
    var expected = sut.WithController(name);
    Assert.Equal(expected.Controller, actual.Controller);
}

Instead of comparing two whole objects, this variation compares the Controller property values from two objects. In order for this to compile, you have to expose an implementation detail: that the class has a class field (here exposed as an automatic property) that keeps track of the Controller name.

I think that most object-oriented programmers' default habit is to write assertions that compare properties or class fields because in both C# and Java, objects by default only have reference equality. This leads to primitive obsession, this time in the context of test assertions.

Structural equality, on the other hand, makes it much easier to write concise and meaningful assertions. Just compare expected with actual.

Structural equality on a Builder? #

The UrlBuilder class has structural equality by overriding Equals and GetHashCode:

public override bool Equals(objectobj)
{
    return obj is UrlBuilder builder &&
           action == builder.action &&
           controller == builder.controller &&
           EqualityComparer<object?>.Default.Equals(values, builder.values);
}
 
public override int GetHashCode()
{
    return HashCode.Combine(action, controller, values);
}

That's why the above Assert.Equal statement works.

You may think that it's an odd choice to give a Fluent Builder structural equality, but why not? Since it's immutable, it's perfectly safe, and it makes things like testing much easier.

I rarely see people do this. Even programmers experienced with functional programming often seem to categorise structural equality as something associated exclusively with algebraic data types (ADTs). The UrlBuilder class, on the other hand, doesn't look like an ADT. After all, its public API exposes only behaviour, but no data:

public sealed class UrlBuilder
{
    public UrlBuilder()
  
    public UrlBuilder WithAction(string newAction)
 
    public UrlBuilder WithController(string newController)
 
    public UrlBuilder WithValues(object newValues)
 
    public Uri BuildAbsolute(IUrlHelper url)
 
    public override bool Equals(objectobj)
 
    public override int GetHashCode()
}

On the other hand, my threshold for when I give an immutable class structural equality is monotonically decreasing. Structural equality just makes things easier. The above test is just one example. Structural equality enables you to test behaviour instead of implementation details. In this example, the behaviour can be expressed as an equality relation between two different inputs.

UrlBuilder as an algebraic data type #

While it may seem odd or surprising to give a Fluent Builder structural equality, it's really isomorphic to a simple record type equipped with a few endomorphisms. (After all, we already know that the Builder pattern is isomorphic to the endomorphism monoid.) Let's make this explicit with F#.

Start by declaring a record type with a private definition:

type UrlBuilder = private { Action : string option; Controller : string option; Values : obj option }

While its definition is private, it's still an algebraic data type. Records in F# automatically have structural equality, and so does this one.

Since it's private, client code can't use the normal language constructs to create instances. Instead, the module that defines the type must supply an API that client code can use:

let emptyUrlBuilder = { Action = None; Controller = None; Values = None }
 
let withAction action ub = { ub with Action = Some action }
 
let withController (controller : string) ub =
    let index = controller.LastIndexOf ("controller", StringComparison.OrdinalIgnoreCase)
    let newController = if 0 <= index then controller.Remove(index) else controller
    { ub with Controller = Some newController }
 
let withValues values ub = { ub with Values = Some values }

Without further ceremony you can port the initial test to F# as well:

[<Theory>]
[<InlineData("Home")>]
[<InlineData("Calendar")>]
[<InlineData("Reservations")>]
let ``withController handles suffix`` name =
    let sut = emptyUrlBuilder
 
    let actual = sut |> withController (name + "Controller")
 
    let expected = sut |> withController name
    expected =! actual

In addition to xUnit.net this test also uses Unquote 6.0.0.

Even though UrlBuilder has no externally visible data, it automatically has structural equality. Functional programming is, indeed, more test-friendly than object-oriented programming.

This F# implementation is equivalent to the C# UrlBuilder class.

Conclusion #

You can safely give immutable objects structural equality. Besides other advantages, it makes it easier to write tests. With structural equality, you can express a relationship between the expected and actual outcome using high-level language.

These days, I don't really care if the type in question is a 'proper' algebraic data type. If it's immutable, I don't have to think much about it before giving it structural equality.


Comments

Records in F# automatically have structural equality, and so does this one.

That is mostly true but not compeltely so. Consider the type

type MyRecord = { MyField: int -> bool }

If you try to compare two instances with F#'s = operator, then you will get this compilier error.

Error FS0001: The type 'MyRecord' does not support the 'equality' constraint because it is a record, union or struct with one or more structural element types which do not support the 'equality' constraint. Either avoid the use of equality with this type, or add the 'StructuralEquality' attribute to the type to determine which field type does not support equality.

Adding the StructuralEquality attribute results in this compiler error.

Error FS1180: The struct, record or union type 'MyRecord' has the 'StructuralEquality' attribute but the component type '(int -> bool)' does not satisfy the 'equality' constraint.

I learned all this the hard way. I had added some F# functions to some of my models in my MVU architecture. Later when I tried to test my root model for structual equality, I ran into this issue. Taking the suggestion in the compiler error, I fixed the problem by adding the StructuralEquality attribute (as well as the NoComparison attribute) to my root model and refactored the code to fix the resulting compiler errors.

During this time, I also realized that F#'s structual equality delegates to object.Equals(object) for types that extend object, which of course defaults to reference equality. For example, the following code compiles.

[<StructuralEquality>] [<NoComparison>] type MyRecord = { MyField: IDisposable }

2021-05-04 11:49 UTC

Tyson, thank you for writing. Yes, you're right. Language is imprecise. F# records automatically have structural equality, when possible.

2021-05-05 4:48 UTC

Leaky abstraction by omission

Monday, 26 April 2021 15:10:00 UTC

Sometimes, an abstraction can be leaky because it leaves something out.

Consider the following interface definition. What's wrong with it?

public interface IReservationsRepository
{
    Task Create(int restaurantId, Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(Guid id);
 
    Task Update(Reservation reservation);
 
    Task Delete(Guid id);
}

Perhaps you think that the name is incorrect; that this really isn't an example of the Repository design pattern, as it's described in Patterns of Enterprise Application Architecture. Ironically, of all patterns, it may be the one most affected by semantic diffusion.

That's not what I have in mind, though. There's something else with that interface.

It's not its CRUD design, either. You could consider that a leaky abstraction, since it strongly implies a sort of persistent data store. That's a worthwhile discussion, but not what I have in mind today. There's something else wrong with the interface.

Consistency #

Look closer at the parameters for the various methods. The Create and ReadReservations methods take a restaurantId parameter, but the other three don't.

Why does the ReadReservations method take a restaurantId while ReadReservation doesn't? Why is that parameter required for Create, but not for Update? That doesn't seem consistent.

The reason is that each reservation has an ID (a GUID). Once the reservation exists, you can uniquely identify it to read, update, or delete it.

As the restaurantId parameter suggests, however, this interface is part of a multi-tenant code base. This code base implements an online restaurant reservation system as a REST API. It's an online service where each restaurant is a separate tenant.

While each reservation has a unique ID, the system still needs to associate it with a restaurant. Thus, the Create method must take a restaurantId parameter in order to associate the reservation with a restaurant.

Once the reservation is stored, however, it's possible to uniquely identify it with the ID. The ReadReservation, Update, and Delete methods need only the id to work.

On the other hand, when you're not querying on reservation ID, you'll need to identify the restaurant, as with the ReadReservations methods. If you didn't identify the restaurant in that method, you'd get all reservations in the requested range, from all tenants. That's not what you want. Therefore, you must supply the restaurantId to limit the query.

The interface is inconsistent, but also allows the underlying implementation to leak through.

Implied implementation detail #

If I told you that the implementation of IReservationsRepository is based on a relational database, can you imagine the design? You may want to stop reading and see if you can predict what the database looks like.

The interface strongly implies a design like this:

CREATE TABLE [dbo].[Reservations] (
    [Id]           INT              IDENTITY (1, 1) NOT NULL,
    [At]           DATETIME2 (7)    NOT NULL,
    [Name]         NVARCHAR (50)    NOT NULL,
    [Email]        NVARCHAR (50)    NOT NULL,
    [Quantity]     INT              NOT NULL,
    [PublicId]     UNIQUEIDENTIFIER NOT NULL,
    [RestaurantId] INT              NOT NULL,
    PRIMARY KEY CLUSTERED ([Id] ASC),
    CONSTRAINT [AK_PublicId] UNIQUE NONCLUSTERED ([PublicId] ASC)
);

What I wrote above is even clearer now. You can't create a row in that table without supplying a RestaurantId, since that column has a NOT NULL constraint.

The PublicId column has a UNIQUE constraint, which means that you can uniquely read and manipulate a single row when you have an ID.

Since all reservations are in a single table, any query not based on PublicId should also filter on RestaurantId. If it doesn't, the result set could include reservations from all restaurants.

Other interpretations #

Is the above relational database design the only possible implementation? Perhaps not. You could implement the interface based on a document database as well. It'd be natural to store each reservation as a separate document with a unique ID. Again, once you have the ID, you can directly retrieve and manipulate the document.

Other implementations become harder, though. Imagine, for example, that you want to shard the database design: Each restaurant gets a separate database. Or perhaps, more realistically, you distribute tenants over a handful of databases, perhaps partitioned on physical location, or some other criterion.

With such a design, the ReadReservation, Update, and Delete methods become more inefficient. While you should be able to identify the correct shard if you have a restaurant ID, you don't have that information. Instead, you'll have to attempt the operation on all databases, thereby eliminating most sharding benefits.

In other words, the absence of the restaurantId parameter from some of the methods suggests certain implementation details.

Leak by omission #

I admit that I rarely run into this sort of problem. Usually, a leaky abstraction manifests by a language construct that contains too much information. This is typically an interface or base class that exposes implementation details by either requiring too specific inputs, or by returning data that reveals implementation details.

For a data access abstraction like the above 'repository', this most frequently happens when people design such an interface around an object-relational mapper (ORM). A class like Reservation would then typically carry ORM details around. Perhaps it inherits from an ORM base class, or perhaps (this is very common) it has a parameterless constructor or getters and setters that model the relationships of the database (these are often called navigation properties).

Another common examples of a leaky abstraction might be the presence of Connect and Disconnect methods. The Connect method may even take a connectionString parameter, clearly leaking that some sort of database is involved.

Yet another example is CQS-violating designs where a Create method returns a database ID.

All such leaky abstractions are leaky because they expose or require too much information.

The example in this article, on the contrary, is leaky because of a lack of detail.

Dependency Inversion Principle #

Ironically, I originally arrived at the above design because I followed the Dependency Inversion Principle (DIP). The clients of IReservationsRepository are ASP.NET Controller actions, like this Delete method:

[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public async Task Delete(int restaurantIdstring id)
{
    if (Guid.TryParse(id, out var rid))
    {
        var r = await Repository.ReadReservation(rid)
            .ConfigureAwait(false);
        await Repository.Delete(rid).ConfigureAwait(false);
        if (r is { })
            await PostOffice.EmailReservationDeleted(restaurantId, r)
                .ConfigureAwait(false);
    }
}

As Robert C. Martin explains about the Dependency Inversion Principle:

"clients [...] own the abstract interfaces"

Robert C. Martin, APPP, chapter 11
From that principle, it follows that the Delete method decides what IReservationsRepository.Delete looks like. It seems that the Controller action doesn't need to tell the Repository about the restaurantId when calling its Delete method. Supplying the reservation ID (rid) is enough.

There are, however, various problems with the above code. If the DIP suggests that the restaurantId is redundant when calling Repository.Delete, then why is it required when calling PostOffice.EmailReservationDeleted? This seems inconsistent.

Indeed it is.

As I often do, I arrived at the above Delete method via outside-in TDD, but as I observed a decade ago, TDD alone doesn't guarantee good design. Even when following the red-green-refactor checklist, I often fail to spot problems right away.

That's okay. TDD doesn't guarantee perfection, but done well it should set you up so that you can easily make changes.

Possible remedies #

I can think of two ways to address the problem. The simplest solution is to make the interface consistent by adding a restaurantId parameter to all methods:

public interface IReservationsRepository
{
    Task Create(int restaurantId, Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(int restaurantId, Guid id);
 
    Task Update(int restaurantId, Reservation reservation);
 
    Task Delete(int restaurantId, Guid id);
}

This is the simplest solution, and the one that I prefer. In a future article, I'll show how it enabled me to significantly simplify the code base.

For good measure, though, I should also mention the opposite solution. Completely drain the interface of restaurantId parameters:

public interface IReservationsRepository
{
    Task Create(Reservation reservation);
 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        DateTime min, DateTime max);
 
    Task<Reservation?> ReadReservation(Guid id);
 
    Task Update(Reservation reservation);
 
    Task Delete(Guid id);
}

How can that work in practice? After all, an implementation must have a restaurant ID in order to create a new row in the database.

It's possible to solve that problem by making the restaurantId an implementation detail. You could make it a constructor parameter for the concrete class, but this gives you another problem. Your Composition Root doesn't know the restaurant ID - after all, it's a run-time argument.

In a method like the above Delete Controller action, you'd have to translate the restaurantId run-time argument to an IReservationsRepository instance. There are various ways around that kind of problem, but they typically involve some kind of factory. That'd be yet another interface:

public interface IReservationsRepositoryFactory
{
    IReservationsRepository Create(int restaurantId);
}

That just makes the API more complicated. Factories give Dependency Injection a bad reputation. For that reason, I don't like this second alternative.

Conclusion #

Leaky abstractions usually express themselves as APIs that expose too many details; the implementation details leak through.

In this example, however, a leaky abstraction manifested as a lack of consistency. Some methods require a restaurantId argument, while others don't - because one particular implementation doesn't need that information.

It turned out, though, that when I was trying to simplify the overall code, this API design held me back. Consistently adding restaurantId parameters to all repository methods solved the problem. A future article tells that tale.

The code shown here is part of the sample code base that accompanies my book Code That Fits in Your Head.


Comments

Thank you for the article Mark.

I was wondering whether another solution would be including restaurantId as a member of Reservation? That way it’s not needed by the Create method.

That just leaves ReadReservations as the last method that requires a restaurant ID, but one could argue a specialized read method such as this one doesn’t belong on a repository anyway. I personally tend to interpret these kinds of methods on a repository as a code smell on projects of a certain size.

I might just be missing the point of your article, but I would love to hear your thoughts. :)

2021-05-01 8:59 UTC

Tobias, thank you for writing. You raise a good point, and it might be an appropriate way to model the problem. While the thought had already crossed my mind, I must admit that I hadn't given it much thought.

For the individual CRUD operations, I admit that it might be an appropriate design option. You do, however, also point to the ReadReservations method as the odd man out. I applaud the intellectual honesty you exhibit by bring this up yourself, and I don't intend to misuse it by shooting down your idea. The fact that this method is somehow different might be an indication that it doesn't belong as a member of the same interface as the other four methods.

If that's the case, though, then where does it belong? One option would be to define all interfaces with only a single method:

public interface IReservationsDateRangeQuery
{ 
    Task<IReadOnlyCollection<Reservation>> ReadReservations(
        int restaurantId, DateTime min, DateTime max);
}

How should we deal with the restaurantId parameter in such an interface? Should it be included, as is the case here, or should we exclude it from the interface definition, like the following?

Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max);

If we choose to exclude the restaurantId parameter from the interface, it'd be consistent with the CRUD interface that you imply. On the other hand, wouldn't it require some sort of factory, as I outlined above?

Conversely, if we decide to keep the restaurantId parameter as part of the interface definition, it seems inconsistent with the design your suggestion implies.

I'm not writing this to shoot down your suggestion. I find it a real conundrum.

I do think, though, that this might be an indication that there's some kind of abstraction that I've failed to make explicit. Some kind of Restaurant or Tenant type seems most likely.

My problem is that I actually do have a Restaurant class in my code base. That one, however, is a Value Object, so I'm loath to add impure methods to it.

For what it's worth, it's deliberation like this that makes software design interesting. You need to balance opposing forces. A good design is one that does so in a stable way. I'm not claiming that the code shown here does that. You've clearly put your finger on a sore spot, which suggests to me that there's more work to be done. Thank you for the inspiring input!

2021-05-02 11:04 UTC
Thibaut #

Hi Mark, thank you for another great article!

I have worked on several small multi-tenant systems and I faced the same problem as you with the repository interface methods and the "tenant id" being mixed.

After several attempts and API iteration, my final design was to use what Steven van Deursen calls The Ambient Composition Model.

The idea is to inject an ITenantContext in the IReservationsRepository implementation and to use it as needed :

					public class ReservationsRepository : IReservationsRepository {
						private readonly ITenantContext _tenantContext;
						
						public ReservationRepository(ITenantContext tenantContext) {
							_tenantContext = tenantContex;
						}
						
						public Task Create(Reservation reservation) {
							var restaurantId = _tenantContext.RestaurantId;
							// ...
						}
					}
				

In my case the implementation of the ITenantContext was retrieving the tenant from the route of the method. I think it could be the same for resolving the restaurantId.

This solution seems similar to your Factory solution but I'm not totally sure. In any case, like the Factory solution, this solution is heavier that the first you proposed.

Nonetheless I find some elegance in this solution with the tenant being injected by request in the implementation. What do you think? Did you have the same idea with the Factory solution?

2021-05-02 19:20 UTC

Thibaut, thank you for writing. Yes, that's another option. I've done something similar to that in the past.

In a sense, the concept of a tenant seems almost like a cross-cutting concern, so it makes sense to let it fade into the background, so to speak.

The reason I'm not too keen on that any longer is that it seems a bit too 'clever' in most settings. Consider the Delete Controller action shown above. Imagine that we inject restaurantId into all services - not only IReservationsRepository , but also into IPostOffice. The Delete method might look like this, then:

[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public async Task Delete(int restaurantIdstring id)
{
    if (Guid.TryParse(id, out var rid))
    {
        var r = await Repository.ReadReservation(rid)
            .ConfigureAwait(false);
        await Repository.Delete(rid).ConfigureAwait(false);
        if (r is { })
            await PostOffice.EmailReservationDeleted(r)
                .ConfigureAwait(false);
    }
}

The restaurantId parameter still has to be present, even though it's unused. This is likely to be puzzling to any developer not intimately familiar with the code base.

It's possible that you can pull some trick with the ASP.NET framework so that the parameter doesn't have to be there, but it'll still be present in the URL, and again, I'm concerned that most developers would be confused about this.

There's also another thing that bothers me about design like this: You can pull the restaurant ID out of the method's routing data, but this implies that you can do the same with the reservation ID. What makes the restaurant ID special, that it ought to be an injected dependency, while the reservation ID isn't?

I'm concerned that the answer to that question might be 'hand-wavy'. And if we can't defend making one ID a dependency and the other not, then we might take this to the logical conclusion and inject all routing values into services. If we do that, the Delete method might now look like this:

[HttpDelete("restaurants/{restaurantId}/reservations/{id}")]
public async Task Delete(int restaurantIdstring id)
{
    if (Repository.IsIdValid)
    {
        var r = await Repository.ReadReservation()
            .ConfigureAwait(false);
        await Repository.Delete().ConfigureAwait(false);
        if (r is { })
            await PostOffice.EmailReservationDeleted(r)
                .ConfigureAwait(false);
    }
}

(I haven't tried to compile this, so there may be syntax errors.)

This seems really odd, although it's possible that we can salvage it by calling the dependency something else than Repository. It's not really a Unit of Work, but seems closer to that sort of design.

I agree that a tenant feels like something that ought to be 'automatically handled', but I wonder whether it's possible to do that without making the code 'too clever'.

2021-05-04 8:26 UTC

How would YAGNI come into play here? For instance, imagine your "client" code wasn't the Delete endpoint, but it was another app or endpoint that only had a "Guid reservationId", but not an "int restaurantId". In such case, wouldn't you be forced to add the restaurantId to the client code? What if this client code doesn't have an easy way to obtain such restaurantId? The reservation id is a global identifier, thus it makes sense that some application (be it a service, console, etc) would just get hold of the guid and nothing else, it's universally identifiable after all, it should be able to identify the reservation uniquely. This may require more roundtrips to the database, or forcing another client one-level above to provide the restaurantId (and this may even require politics and management to get in).

Wouldn't YAGNI say that you shouldn't add the restaurantId to the API, since you ain't gonna need it? I.e, you likely won't change your data access implementation or shard the database in a way that would require that additional restaurantId, and even if you did, perhaps the development effort to add the restaurantId would be the same in that future timeline as it would be right now, so it would be the same to make this change now or afterwards (and in such case, wouldn't it make much sense to make the change afterwards, when you actually need it?).

2021-05-09 23:54 UTC

Gonzalo, thank you for writing. The short answer is that I only 'discovered' the leaky abstraction because I did, in fact, need the restaurant ID. As part of creating, modifying, or deleting reservations, I also wanted to send email updates. For example, when updating a reservation, the system should send an email with a subject line like "Your reservation for Nono changed."

This meant that I had to figure out which name to put in the subject line. Given the restaurant ID, this is trivial, but without it, the system would first have to make a 'reverse lookup' to find the restaurant associated with the reservation ID. While it's technically possible, it seems overly complicated given that the restaurantId was already available at the entry point.

It's true that since the reservation ID is a GUID, it's globally unique. This, however, is an implementation detail. The system doesn't allow external client to refer to a reservation exclusively by a GUID. Rather, from the perspective of an external client, the ID of a reservation looks like https://api.example.com/restaurants/90125/reservations/667fd3ca5d6943b993860809adc99ad5?sig=aqwnNXj28J547upBELNf5I6yPIrQ%2BG9DG2QIAlOpgOE%3D. While both restaurant and reservation IDs are visible within that string, a client can't use those IDs. The external reservation ID is the full (conceptually opaque) string.

I agree, though, that YAGNI is relevant in this context, too. If it's any consolation, I didn't change the code 'just in case' - I did, in fact, change it because I realised that I needed the modified design. But I admit that I didn't explicitly state that in this article.

You may also find the above discussion relevant.

2021-05-11 6:56 UTC

Consider including identity in URLs

Monday, 19 April 2021 06:29:00 UTC

Automatically enable act-on-behalf-of capabilities to REST APIs.

In 2013 I published a series of API design tips called REST lessons learned. Eight years have passed, but why not add another entry?

This one I've known about for years, but never written down. I often use it when I consult teams, and each time I'm reminded that since this seems like a recurring piece of advice, I ought to write it down.

Nutshell #

The problem, in a nutshell, relates to secured resources in a REST API. This could be any resource where the client must be authenticated before being able to access it. This design tip, however, seems to be mostly applicable when the resource in question itself represents an 'identity'.

To scope the problem, API designers rarely falter when modelling resources that seems unrelated to security or identity. For example, if you're modelling a product catalogue and you want to enable some clients to edit the catalogue, it's clear to most people that a product is unrelated to the identity of the client. Thus, people naturally design URL schemes like products/1234, and that's fine. You can make a PUT request against products/1234 to edit the resource, but you must supply credentials in order to do so.

What if, however, you want to edit your own profile information? There might be a REST resource that exposes your user name, address, bio, avatar, etc. You want to make profile information editable. How do you design the API?

API designers often design such an API based on a URL like profile, without any identifer in the URL. After all, a client must be authenticated in order to edit the resource, so the user ID will somehow be in the HTTP header (e.g. as a JSON Web Token (JWT)).

Consider, nonetheless, to include the identity in the URL.

A profile resource, then, would follow a scheme like profiles/1234. Consider identifying tenant IDs in a multi-tenant system in the same way: tenants/2345. Do this even when other IDs follow: tenants/2345/products/9876.

Typical approach, not recommended #

As outlined above, a typical design is to design an 'identity' resource without including the identification in the URL. If, for example, a client wants to change the avatar via a REST API, it might have to do it like this:

PUT /users HTTP/1.1
Content-Type: application/json
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5c[...]
{
  "bio":  "Danish software design",
  "avatar""ploeh.png"
}

The server-side code can extract the user ID and other authentication information from the Bearer token in the HTTP header. It can use this information to find the user ID and update its database. Technically, this gets the job done.

I'll outline some potential problems with such a design in a moment, but first I'll show a second example. This one is more subtle.

Imagine an online restaurant reservation system. The system enables guests to make reservations, edit them, and so on. When a potential guest attempts to make a reservation, the API should check if it can accept it. See The Maître d' kata for various conditions that may cause the restaurant to reject the reservation. One case might be that the reservation attempt is outside of the restaurant's opening hours.

Perhaps the API should expose a management API that enables the restaurant's maître d'hôtel to change the opening hours. Perhaps you decide to design the API to look like this:

PUT /restaurant HTTP/1.1
Content-Type: application/json
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5c[...]
{
  "opensAt""18:00",
  "lastSeating""21:00",
  "seatingDuration""6:00"
}

Again, the Bearer token is supposed to contain enough information about the user to enable authentication and authorisation. This also gets the job done, but might paint you into a corner.

Separation of concerns #

The problem with the above approach is that it fails to separate concerns. When modelling identity, it's easy to conflate the identity of the resource with the identity of the client interacting with it. Those are two separate concerns.

What happens, for example, if you have so much success with the above restaurant reservation system that you decide to offer it as a multi-tenant service?

I often see a 'solution' to such a requirement where API designers now require clients to supply a 'tenant ID' in the HTTP header. To make it secure, you should probably make it a claim in the JWT supplied via the Authorization header, or something to that effect.

What's wrong with that? It conflates the identity of the client with the identity of the resource. This means that you can't easily enable capabilities where a client can act on behalf of someone else.

Imagine, for example, that you have three restaurants, each a tenant: Hipgnosta, Nono, and The Vatican Cellar. It turns out, however, that Hipgnosta and Nono have the same owners, and share a single administrative employee. These restaurants wish to let that employee manage both restaurants.

With the design outlined above, the employee would have to authenticate twice in order to make changes to both restaurants. That may not be a big deal for occasional edits to two restaurants, but imagine an employee who has to manage hundreds of franchises, and the situation becomes untenable.

You should enable act-on-behalf-of capabilities. This may sound like speculative generality, but it's such a low-hanging fruit that I think you should enable it even if you don't need it right now. Just put the resource identity in the URL: restaurants/456 and users/1234.

Even for user profiles, putting the user ID in the URL enables one client to view (if not edit) other user profiles, which may or may not be desirable.

The API should still demand that clients authenticate, but now you can distinguish the resource from the client making the request. This makes it possible for a client to act on behalf of others, given the right credentials.

Restaurant schedule example #

I'll show you a slightly different example. Instead of editing a restaurant's opening or closing hours, I'll show you how the maître d' can view the schedule for a day. A previous article already suggested that such a resource might exist in a code base I've recently written. A request and its response might look like this:

GET /restaurants/1/schedule/2022/8/21 HTTP/1.1
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJyZXN0YXVyYW5[...]

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{
  "name""Hipgnosta",
  "year": 2022,
  "month": 8,
  "day": 21,
  "days": [
    {
      "date""2022-08-21",
      "entries": [
        {
          "time""19:45:00",
          "reservations": [
            {
              "id""0cced578fa21489bb0e3b5eb6be6825a",
              "at""2022-08-21T19:45:00.0000000",
              "email""annekoicchamber@example.com",
              "name""Anne Kowics Chambers",
              "quantity": 5
            }
          ]
        }
      ]
    }
  ]
}

I've simplified the example response by removing all links to make it more readable. After all, the shape of the response is irrelevant for this discussion. The point is the interaction between the request URL and the JWT.

The request is against a URL that identifies the restaurant in question. The 1 after restaurants in /restaurants/1/schedule/2022/8/21 identifies the restaurant as Hipgnosta to the API. (In reality, clients are expected to follow links. URLs are signed with HMACs, but I've trimmed those off as well to simplify the example.)

In this multi-tenant API, each restaurant is a separate tenant. Thus, the restaurant ID is really a tenant ID. The resource is fully identified via the URL.

What about the client identity? It's supplied via the JWT, which decoded contains these claims:

{
  "restaurant": [
    "1",
    "2112"
  ],
  "role""MaitreD",
  "nbf": 1618301674,
  "exp": 1618906474,
  "iat": 1618301674
}

Notice that the restaurant array contains a list of IDs that identify the tenants that the JWT can access. This particular JWT can access both restaurants 1 and 2112, which correspond to Hipgnosta and Nono. This represents the shared employee who can act on behalf of both restaurants.

Access control #

The API checks the that the incoming JWT has a restaurant claim that matches the incoming restaurant ID. Only if that's the case will it let the request through.

[HttpGet("restaurants/{restaurantId}/schedule/{year}/{month}/{day}")]
public async Task<ActionResult> Get(int restaurantId, int year, int month, int day)
{
    if (!AccessControlList.Authorize(restaurantId))
        return new ForbidResult();
 
    // Do the real work here...

The above code fragment is a copy from another article where I already shared some of the server-side authorisation code. Here I'll show some of the code that I didn't show in the other article. The code is part of the sample code base that accompanies my book Code That Fits in Your Head.

In the other article, you can see how the AccessControlList is populated from HttpContext.User, but I didn't show the implementation of the FromUser function. Here it is:

internal static AccessControlList FromUser(ClaimsPrincipal user)
{
    var restaurantIds = user
        .FindAll("restaurant")
        .SelectMany(c => ClaimToRestaurantId(c))
        .ToList();
    return new AccessControlList(restaurantIds);
}
 
private static int[] ClaimToRestaurantId(Claim claim)
{
    if (int.TryParse(claim.Value, out var i))
        return new[] { i };
    return Array.Empty<int>();
}

What you need to notice is just that the FromUser function finds and parses all the "restaurant" claims it can find. The Authorize method, subsequently, just looks for the incoming restaurantId among them:

internal bool Authorize(int restaurantId)
{
    return restaurantIds.Contains(restaurantId);
}

Thus, the identity of the resource is decoupled from the identity of the client. In this example, the client acts on behalf of two tenants, but since an array can hold an arbitrary number of values, there's no hard limit to how many tenants a single client could act on behalf of.

Conclusion #

You don't always need act-on-behalf-of security features, but you never know if such a need might emerge in the future. You're going to need to check client credentials anyway, so the only extra step to avoid painting yourself into a corner is to put the resource identity in the URL - even if you believe that the resource identity and the client identity is the same. Such assumptions have a tendency to be proven wrong over time.

I'm not usually a proponent of speculative generality, but I also think that it's prudent to consider overall return of investment. The cost of adding the resource identity to the URL is low, while having to change URL schemes later may carry a higher cost (even if you force clients to follow links).

This fits one view on software architecture: Make it as easy to make reactive changes to the system, but identify the areas where change will be hard; make good ex-ante decisions about those.

Finally, I think that there's something fundamentally correct and consistent in putting user or tenant IDs in the URLs. After all, you put all other resource IDs (such as product IDs or customer IDs) in URLs.

Notice, in the above schedule example, how the restaurant ID isn't the only ID. The URL also carries information about year, month, and date. These further identify the schedule resource.

Putting user or tenant IDs in the URL effectively separates concerns. It enables you to discern the tenant or user from the client making the request.


Threading context through a catamorphism

Monday, 12 April 2021 11:09:00 UTC

A problem solved after 1½ years.

You've probably noticed that it's easier to learn something new if it looks or sounds like something you already know. As a native Dane, I've found it easier to learn English and German than Russian and Japanese. If you originally were a Java or C# developer, you probably find JavaScript more approachable than Clojure or APL.

I believe that this extends to design patterns and universal abstractions as well. If code new to you follows well-known abstractions, it may be easier to learn than if it's structured in an entirely ad-hoc manner. This is my motivation for learning such universal abstractions as monoids, functors, and catamorphisms.

I particularly enjoy when it's possible to apply such abstractions to a proper problem. This occasionally happens. One example is my small article series on a functional file system.

A fly in the ointment #

In those articles, I described how you could base most of the code on the rose tree catamorphism. There was just one snag. There was one function, calculateMoves, that I was unable to implement with the catamorphism. In the article, I acknowledged my failure:

"Earlier, I wrote that you can implement desired Tree functionality with the foldTree function, but that was a simplification. If you can implement the functionality of calculateMoves with foldTree, I don't know how."
This was true for both the Haskell proof of concept as well as the F# port.

Tyson Williams and I discussed this wart without getting closer to a solution.

As the idiom goes, perfect is the enemy of good, so I decided to move on, although it nagged me.

Problem, condensed #

The problem with the calculateMoves function was that it needed to thread a 'context' recursively through the entire data structure. In this case, the context was a file path.

When calculateMoves runs over the input tree, it needs to thread a relative path through the function, building it up as it traverses the data structure.

For example, if a leaf node named 1 is in a directory named b, which itself is a subdirectory of a, the relative path should be a/b/1. This example is straight from the test cases shown in both articles. You can also find the tests in the GitHub repository.

Each time calculateMoves visits a Node or Leaf it needs to know the parent path to calculate the destination path. As the articles show, this isn't too hard to do with regular pattern matching and recursion.

I couldn't figure out, however, how to thread the path through the function when I tried to implement it with the catamorphism.

Breakthrough #

While I'm ready to walk away from problems when I'm stuck, I tend to remember them. Sometimes, I run into a solution much later.

This happened to me yesterday. I was trying to answer a Stack Overflow question which was explicitly about the application of universal abstractions. Once more, I was stuck by being unable to thread a 'context' through a catamorphism. This time, instead of a path, the context was an indentation depth. Basically, the question was how to render a tree with proper indentation.

Again, this isn't hard if you resort to explicit pattern matching and recursion, but I couldn't figure out how to do it via the data structure's catamorphism.

Fortunately, the user danidiaz posted an awesome answer while I was struggling. The answer uses a trick that I hadn't noticed before: It threads the indentation depth through the data structure by using the catamorphism to produce a structure map with a function as the carrier type. Specifically, danidiaz defines the algebra Todo' (Int -> String) -> Int -> String to reduce a Todo' (Int -> String) to an Int -> String function. This function then gets initialised with the depth 0.

While I've been doing functional programming for years, I sometimes still tend to forget that functions are first-class values...

This trick, though, seems to be universally applicable. If you need to thread a context through a catamorphism, define the algebra to work on functions that take the context as an argument.

If this is a universally applicable trick, it also ought to work with the calculateMoves function.

Haskell re-implementation #

In my Haskell proof of concept, the calculateMoves function originally looked like this:

calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move
calculateMoves = imp ""
  where imp path    (Leaf x) = Leaf $ Move x $ replaceDirectory x path
        imp path (Node x xs) = Node (path </> x) $ imp (path </> x) <$> xs

It uses an imp (for implementation) function to explicitly recurse over a Tree FilePath FilePath. Until yesterday, I couldn't come up with a better solution to thread the path through the data structure.

The new trick suggests that it'd be possible to implement the function on foldTree (the catamorphism) by using a function as the carrier type. Since the context to be threaded through the catamorphism is a String (the path), the catamorphism should produce a function that takes a String as argument. In other words, the carrier type of the Tree should be String -> Tree FilePath Move.

Let's expand on this: The type of foldTree is foldTree :: (a -> [c] -> c) -> (b -> c) -> Tree a b -> c. Usually, I tend to think of the type parameter c as the type of some value, but since it's unconstrained, it can also be a function. That's what we need here: c should be String -> Tree FilePath Move.

That's not too hard to do, because of currying. Just write functions that take an extra String argument and pass them to foldTree:

calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move
calculateMoves t = foldTree fNode fLeaf t ""
  where
    fLeaf :: FilePath -> String -> Tree FilePath Move
    fLeaf x    path = Leaf $ Move x $ replaceDirectory x path
    fNode :: FilePath -> [String -> Tree FilePath Move-> String -> Tree FilePath Move
    fNode x fs path = Node (path </> x) $ ($ path </> x) <$> fs

Here I've used type annotations for the local functions, but that's entirely optional:

calculateMoves :: Tree FilePath FilePath -> Tree FilePath Move
calculateMoves t = foldTree fNode fLeaf t ""
  where
    fLeaf x    path = Leaf $ Move x $ replaceDirectory x path
    fNode x fs path = Node (path </> x) $ ($ path </> x) <$> fs

I included the type annotations to make it a little clearer what's going on. Recall that the type of foldTree is foldTree :: (a -> [c] -> c) -> (b -> c) -> Tree a b -> c. First consider the second of the two function arguments, the one I call fLeaf in the above code. It's the simplest of the two, so it makes sense to start with that one.

The generic type of fLeaf is b -> c. How does that map to the type of fLeaf, which is FilePath -> String -> Tree FilePath Move?

Well, the Tree that the catamorphism runs on is a Tree FilePath FilePath. Mapped to the parametrically polymorphic type of foldTree that's Tree a b. In other words, b maps to FilePath. Thus, in order to fit the type of b -> c, the type corresponding to b in fLeaf must be FilePath. What's left? String -> Tree FilePath Move is what's left. The function takes a FilePath as input and returns a String -> Tree FilePath Move. In other words, c ~ String -> Tree FilePath Move.

How does that fit with fNode?

Generically, this function must have the type a -> [c] -> c. We've already established that c must be String -> Tree FilePath Move. Since the catamorphism runs on a Tree FilePath FilePath, we also know that a must be FilePath. Thus, plugging in all the types, fNode must have the type FilePath -> [String -> Tree FilePath Move] -> String -> Tree FilePath Move. Note, particularly, that the second argument is a list of functions. That's why I decided to name the parameter fs, for functions.

The entire expression foldTree fNode fLeaf t, then, has the type String -> Tree FilePath Move, since c is String -> Tree FilePath Move and the return type of foldTree is c.

The final trick is to apply this function to the initial relative path "", which returns a Tree FilePath Move.

This compiles and passes all tests. calculateMoves is now implemented using the Tree catamorphism, a goal that eluded me for more than one and a half years.

F# re-implementation #

With the Haskell proof of concept in place, it's fairly trivial to port the new implementation to the F# code base.

The calculateMoves function originally looked like this:

// Tree<string,FileInfo> -> Tree<string,Move>
let calculateMoves =
    let replaceDirectory (f : FileInfo) d =
        FileInfo (Path.Combine (d, f.Name))
    let rec imp path = function
        | Leaf x ->
            Leaf { Source = x; Destination = replaceDirectory x path }
        | Node (x, xs) ->
            let newNPath = Path.Combine (path, x)
            Tree.node newNPath (List.map (imp newNPath) xs)
    imp ""

In the F# code base, the catamorphism is called Tree.cata, but otherwise looks like the Haskell foldTree function. The refactoring is also similar:

// Tree<string, FileInfo> -> Tree<string, Move>
let calculateMoves t =
    // FileInfo -> string -> FileInfo
    let replaceDirectory (f : FileInfo) d = FileInfo (Path.Combine (d, f.Name))
    // FileInfo -> string -> Tree<'a, Move>
    let fLeaf x path = Leaf { Source = x; Destination = replaceDirectory x path }
    // string -> (string -> Tree<string, 'a>) list -> string -> Tree<string, 'a>
    let fNode x fs path =
        let newNPath = Path.Combine (path, x)
        Tree.node newNPath (List.map (fun f -> f newNPath) fs)
    Tree.cata fNode fLeaf t ""

Again, the expression Tree.cata fNode fLeaf t has the type string -> Tree<string, Move>, so applying it to "" produces a Tree<string, Move> return value.

Conclusion #

I don't recall where I read the following, but I was under the impression that a data structure's catamorphism was its 'universal API', upon which you could implement any other functionality. I'd love it if it was true, but after my 2019 failure to implement calculateMoves via the Tree catamorphism, I wasn't sure if such a conjecture would hold.

I still don't know if that assertion holds universally, but at least one reason to doubt it has now been removed.


Comments

Excellent work Mark! I too had not forgotten about this, and it nagged me as well.

To some extent, I feel like your explanation of how to implement calculateMoves via Tree.cata is top-down. By top-down, I mean it might depend on discovering the key idea of having Tree.cata return a function and then figuring out the correct type for that function. A good thing about such top-down approaches is being immediately aware that a better solution likely exists even if it takes some time and effort to find it.

I was curious if a bottom-up approach would work. By bottom-up, I mean applying small refacorings to the code that are motivated by the principles, conventions, or style of functional programming. I do think I found such an approach. Of course it is a bit contradictory of me to only be able to find this approach after I read your presentation of the top-down approach. However, I am thinking of it like a kata. I now know such a bottom-up approach should be possible, and I want to find it.

My bottom-up approach is in this branch. Here is a brief summary of how I want myself to think of making those commits in that order.

Each case of the discriminated union could be extracted to its own function. This is easy to do in the Leaf case (so do it now), but it is not as easy to do in the Node case because of recursion, so delay that change for a bit. If we did extract both functions though, both functions would include the argument that I called pathToParent. Since it is passed in everywhere, it should be passed in nowhere (by eta reducing). To do that, we need it to be the last parameter to imp. After switching this order, we now deal with the recursion by doing it as soon as possible. Then the remaining code in that case can be extracted, and imp is essentially Tree.cata.

In this approach, I never thought about the possibility of Tree.cata returning a function. It just sort of fell out as a consequence of my other changes.

2021-04-12 17:49 UTC

Very nice!

In Haskell there is a library called recursion-schemes that showcases these types of recursion with catamorphisms, but also with many others recursion schemes. You can check it out and see if it gives you any new ideas.

Regarding this use of catamorphism, the library itself I believe shows a very similar example here, using the Reader type (which is isomorphic to the function you used in your example):

>>> :{
let pprint2 :: Tree Int -> String
    pprint2 = flip runReader 0 . cataA go
      where
	go :: TreeF Int (Reader Int String)
	   -> Reader Int String
	go (NodeF i rss) = do
	  -- rss :: [Reader Int String]
	  -- ss  :: [String]
	  ss <- local (+ 2) $ sequence rss
	  indent <- ask
	  let s = replicate indent ' ' ++ "* " ++ show i
	  pure $ intercalate "\n" (s : ss)
:}
			
>>> putStrLn $ pprint2 myTree
* 0
  * 1
  * 2
  * 3
    * 31
      * 311
	* 3111
	* 3112
			
2021-04-14 02:27 UTC

Gonzalo, thank you for reminding me of the recursion-schemes library. It's one of those tomes of knowledge of which I'm aware, but never really have gotten around to look at...

2021-04-16 6:29 UTC

Mazes on Voronoi tessellations

Monday, 05 April 2021 09:03:00 UTC

Recursive backtracker maze generation on a Voronoi diagram.

Today's blog post appears on Observable. It's an interactive environment where you can play with and fork the code. Go there to read it.

Recursive backtracker algorithm running on a Voronoi tessellation.

Observable is a really neat platform which has managed to do what I thought was nigh-impossible: make me return to JavaScript. The site's been around for some years, and I hope it'll be around for even more years.

ploeh blog, on the other hand, has been around since 2009, and I intend to keep it around for much longer. Who knows if Observable will outlive the blog. Enjoy the article while it's there.


Table-driven tennis scoring

Monday, 29 March 2021 06:15:00 UTC

Probably the most boring implementation of the tennis kata I've ever written.

Regular readers of this blog will know that I keep coming back to the tennis kata. It's an interesting little problem to attack from various angles.

The tennis scoring rules essentially describe a finite state machine, and while I was thinking about the state transitions involved, I came across an article by Michael McCandless about scoring tennis using finite-state automata.

This isn't the first time I've thought about simply enumerating all possible states in the state machine, but I decided to spend half an hour on actually doing it. While Michael McCandless shows that an optimisation is possible, his minimised version doesn't enable us to report all intermediary states with the correct labels. For example, he optimises away thirty-all by replacing it with deuce. The end result is still the same, in the sense that the minimised state machine will arrive at the same winner for the same sequence of balls, but it can't correctly report the score while the game is in progress.

For that reason, I decided to use his non-optimised state machine as a launch pad.

States #

I used F# to enumerate all twenty states:

type Score =
    | LoveAll
    | FifteenLove
    | LoveFifteen
    | ThirtyLove
    | FifteenAll
    | LoveThirty
    | FortyLove
    | ThirtyFifteen
    | FifteenThirty
    | LoveForty
    | FortyFifteen
    | ThirtyAll
    | FifteenForty
    | GamePlayerOne
    | FortyThirty
    | ThirtyForty
    | GamePlayerTwo
    | AdvantagePlayerOne
    | Deuce
    | AdvantagePlayerTwo

Utterly boring, yes, but perhaps boring code might be good code.

Table-driven methods #

Code Complete describes a programming technique called table-driven methods. The idea is to replace branching instructions such as if, else, and switch with a lookup table. The book assumes that the table exists in memory, but in this case, we can implement the table lookup with pattern matching:

// Score -> Score
let ballOne = function
    | LoveAll            -> FifteenLove
    | FifteenLove        -> ThirtyLove
    | LoveFifteen        -> FifteenAll
    | ThirtyLove         -> FortyLove
    | FifteenAll         -> ThirtyFifteen
    | LoveThirty         -> FifteenThirty
    | FortyLove          -> GamePlayerOne
    | ThirtyFifteen      -> FortyFifteen
    | FifteenThirty      -> ThirtyAll
    | LoveForty          -> FifteenForty
    | FortyFifteen       -> GamePlayerOne
    | ThirtyAll          -> FortyThirty
    | FifteenForty       -> ThirtyForty
    | GamePlayerOne      -> GamePlayerOne
    | FortyThirty        -> GamePlayerOne
    | ThirtyForty        -> Deuce
    | GamePlayerTwo      -> GamePlayerTwo
    | AdvantagePlayerOne -> GamePlayerOne
    | Deuce              -> AdvantagePlayerOne
    | AdvantagePlayerTwo -> Deuce

The ballOne function returns the new score when player one wins a ball. It takes the old score as input.

I'm going to leave ballTwo as an exercise to the reader.

Smoke test #

Does it work, then? Here's a few interactions with the API in F# Interactive:

> ballOne LoveAll;;
val it : Score = FifteenLove

> LoveAll |> ballOne |> ballTwo;;
val it : Score = FifteenAll

> LoveAll |> ballOne |> ballTwo |> ballTwo;;
val it : Score = FifteenThirty

> LoveAll |> ballOne |> ballTwo |> ballTwo |> ballTwo;;
val it : Score = FifteenForty

> LoveAll |> ballOne |> ballTwo |> ballTwo |> ballTwo |> ballOne;;
val it : Score = ThirtyForty

> LoveAll |> ballOne |> ballTwo |> ballTwo |> ballTwo |> ballOne |> ballTwo;;
val it : Score = GamePlayerTwo

It looks like it's working.

Automated tests #

Should I be writing unit tests for this implementation?

I don't see how a test would be anything but a duplication of the two 'transition tables'. Given that the score is thirty-love, when player one wins the ball, then the new score should be forty-love. Indeed, the ballOne function already states that.

We trust tests because they are simple. When the implementation is as simple as the test that would exercise it, then what's the benefit of the test?

To be clear, there are still compelling reasons to write tests for some simple implementations, but that's another discussion. I don't think those reasons apply here. I'll write no tests.

Code size #

While this code is utterly dull, it takes up some space. In all, it runs to 67 lines of code.

For comparison, the code base that evolves throughout my Types + Properties = Software article series is 65 lines of code, not counting the tests. When I also count the tests, that entire code base contains around 300 lines of code. That's more than four times as much code.

Preliminary research implies that bug count correlates linearly with line count. The more lines of code, the more bugs.

While I believe that this is probably a simplistic rule of thumb, there's much to like about smaller code bases. In total, this utterly dull implementation is actually smaller than a comparable implementation built from small functions.

Conclusion #

Many software problems can be modelled as finite state machines. I find that this is often the case in my own field of line-of-business software and web services.

It's not always possible to exhaustively enumerate all states, because each 'type' of state carries data that can't practically be enumerated. For example, polling consumers need to carry timing statistics. These statistics influence how the state machine transitions, but the range of possible values is so vast that it can't be enumerated as types.

It may not happen often that you can fully enumerate all states and transitions of a finite state machine, but I think it's worthwhile to be aware of such refactoring opportunities. It might make your code dully simple.


Comments

Hi Mark, I have had a similar experience whilst coding a Shut the box game, when trying to detect if it was game over or not.
Originally it was a complex set of loops to calculate all the discrete summands for each roll of the dice, then checking if the remaining flaps were in that set. This was done along with a suite of tests for every possible combination set of summands up to 12 (for 2 dice).
Then whilst explaining the pain in writing this to a friend, they simply said, there's only a finite list, why not hard code them?, and that's what I went with, a dictionary with each possible roll from 2 dice, and the possible values from the flaps that could be used to meet that roll. All the tests were removed; as you pointed out, they would just be a reimplmentation of the table.

2021-04-07 13:30 UTC

Dave, thank you for writing. It's good to hear that you have a similar experience. I wonder if it's constrained to game simulation, or if 'real-world' examples exist.

2021-04-09 6:30 UTC

Page 16 of 73

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!