A first stab at the Brainfuck kata

Monday, 11 September 2023 08:07:00 UTC

I almost gave up, but persevered and managed to produce something that works.

As I've previously mentioned, a customer hired me to swing by to demonstrate test-driven development and tactical Git. To make things interesting, we agreed that they'd give me a kata at the beginning of the session. I didn't know which problem they'd give me, so I thought it'd be a good idea to come prepared. I decided to seek out katas that I hadn't done before.

The demonstration session was supposed to be two hours in front of a participating audience. In order to make my preparation aligned to that situation, I decided to impose a two-hour time limit to see how far I could get. At the same time, I'd also keep an eye on didactics, so preferably proceeding in an order that would be explainable to an audience.

Some katas are more complicated than others, so I'm under no illusion that I can complete any, to me unknown, kata in under two hours. My success criterion for the time limit is that I'd like to reach a point that would satisfy an audience. Even if, after two hours, I don't reach a complete solution, I should leave a creative and intelligent audience with a good idea of how to proceed.

After a few other katas, I ran into the Brainfuck kata one Thursday. In this article, I'll describe some of the most interesting things that happened along the way. If you want all the details, the code is available on GitHub.

Understanding the problem #

I had heard about Brainfuck before, but never tried to write an interpreter (or a program, for that matter).

The kata description lacks examples, so I decided to search for them elsewhere. The wikipedia article comes with some examples of small programs (including Hello, World), so ultimately I used that for reference instead of the kata page.

I'm happy I wasn't making my first pass on this problem in front of an audience. I spent the first 45 minutes just trying to understand the examples.

You might find me slow, since the rules of the language aren't that complicated. I was, however, confused by the way the examples were presented.

As the wikipedia article explains, in order to add two numbers together, one can use this idiom:

[->+<]

The article then proceeds to list a small, complete program that adds two numbers. This program adds numbers this way:

[        Start your loops with your cell pointer on the loop counter (c1 in our case)
< +      Add 1 to c0
> -      Subtract 1 from c1
]        End your loops with the cell pointer on the loop counter

I couldn't understand why this annotated 'walkthrough' explained the idiom in reverse. Several times, I was on the verge of giving up, feeling that I made absolutely no progress. Finally, it dawned on me that the second example is not an explanation of the first example, but rather a separate example that makes use of the same idea, but expresses it in a different way.

Most programming languages have more than one way to do things, and this is also the case here. [->+<] adds two numbers together, but so does [<+>-].

Once you understand something, it can be difficult to recall why you ever found it confusing. Now that I get this, I'm having trouble explaining what I was originally thinking, and why it confused me.

This experience does, however, drive home a point for educators: When you introduce a concept and then provide examples, the first example should be a direct continuation of the introduction, and not some variation. Variations are fine, too, but should follow later and be clearly labelled.

After 45 minutes I had finally cracked the code and was ready to get programming.

Getting started #

The kata description suggests starting with the +, -, >, and < instructions to manage memory. I briefly considered that, but on the other hand, I wanted to have some test coverage. Usually, I take advantage of test-driven development, and I while I wasn't sure how to proceed, I wanted to have some tests.

If I were to exclusively start with memory management, I would need some way to inspect the memory in order to write assertions. This struck me as violating encapsulation.

Instead, I thought that I'd write the simplest program that would produce some output, because if I had output, I would have something to verify.

That, on the other hand, meant that I had to consider how to model input and output. The Wikipedia article describes these as

"two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII character encoding)."

Knowing that you can model the console's input and output streams as polymorphic objects, I decided to model the output as a TextWriter. The lowest-valued printable ASCII character is space, which has the byte value 32, so I wrote this test:

[Theory]
[InlineData("++++++++++++++++++++++++++++++++."" ")] // 32 increments; ASCII 32 is space
public void Run(string programstring expected)
{
    using var output = new StringWriter();
    var sut = new BrainfuckInterpreter(output);
 
    sut.Run(program);
    var actual = output.ToString();
 
    Assert.Equal(expected, actual);
}

As you can see, I wrote the test as a [Theory] (parametrised test) from the outset, since I predicted that I'd add more test cases soon. Strictly speaking, when following the red-green-refactor checklist, you shouldn't write more code than absolutely necessary. According to YAGNI, you should avoid speculative generality.

Sometimes, however, you've gone through a process so many times that you know, with near certainty, what happens next. I've done test-driven development for decades, so I occasionally allow my experience to trump the rules.

The Brainfuck program in the [InlineData] attribute increments the same data cell 32 times (you can count the plusses) and then outputs its value. The expected output is the space character, since it has the ASCII code 32.

What's the simplest thing that could possibly work? Something like this:

public sealed class BrainfuckInterpreter
{
    private readonly StringWriter output;
 
    public BrainfuckInterpreter(StringWriter output)
    {
        this.output = output;
    }
 
    public void Run(string program)
    {
        output.Write(' ');
    }
}

As is typical with test-driven development (TDD), the first few tests help you design the API, but not the implementation, which, here, is deliberately naive.

Since I felt pressed for time, having already spent 45 minutes of my two-hour time limit getting to grips with the problem, I suppose I lingered less on the refactoring phase than perhaps I should have. You'll notice, at least, that the BrainfuckInterpreter class depends on StringWriter rather than its abstract parent class TextWriter, which was the original plan.

It's not a disastrous mistake, so when I later discovered it, I easily rectified it.

Implementation outline #

To move on, I added another test case:

[Theory]
[InlineData("++++++++++++++++++++++++++++++++."" ")] // 32 increments; ASCII 32 is space
[InlineData("+++++++++++++++++++++++++++++++++.""!")] // 33 increments; ASCII 32 is !
public void Run(string programstring expected)
{
    using var output = new StringWriter();
    var sut = new BrainfuckInterpreter(output);
 
    sut.Run(program);
    var actual = output.ToString();
 
    Assert.Equal(expected, actual);
}

The only change is the addition of the second [InlineData] attribute, which supplies a slightly different Brainfuck program. This one has 33 increments, which corresponds to the ASCII character code for an exclamation mark.

Notice that I clearly copied and pasted the comment, but forgot to change the last 32 to 33.

In my eagerness to pass both tests, and because I felt the clock ticking, I made another classic TDD mistake: I took too big a step. At this point, it would have been enough to iterate over the program's characters, count the number of plusses, and convert that number to a character. What I did instead was this:

public sealed class BrainfuckInterpreter
{
    private readonly StringWriter output;
 
    public BrainfuckInterpreter(StringWriter output)
    {
        this.output = output;
    }
 
    public void Run(string program)
    {
        var imp = new InterpreterImp(program, output);
        imp.Run();
    }
 
    private sealed class InterpreterImp
    {
        private int programPointer;
        private readonly byte[] data;
        private readonly string program;
        private readonly StringWriter output;
 
        internal InterpreterImp(string program, StringWriter output)
        {
            data = new byte[30_000];
            this.program = program;
            this.output = output;
        }
 
        internal void Run()
        {
            while (!IsDone)
                InterpretInstruction();
        }
 
        private bool IsDone => program.Length <= programPointer;
 
        private void InterpretInstruction()
        {
            var instruction = program[programPointer];
            switch (instruction)
            {
                case '+':
                    data[0]++;
                    programPointer++;
                    break;
                case '.':
                    output.Write((char)data[0]);
                    programPointer++;
                    break;
                default:
                    programPointer++;
                    break;
            }
        }
    }

With only two test cases, all that code isn't warranted, but I was more focused on implementing an interpreter than on moving in small steps. Even with decades of TDD experience, discipline sometimes slips. Or maybe exactly because of it.

Once again, I was fortunate enough that this implementation structure turned out to work all the way, but the point of the TDD process is that you can't always know that.

You may wonder why I decided to delegate the work to an inner class. I did that because I expected to have to maintain a programPointer over the actual program, and having a class that interprets one program has better encapsulation. I'll remind the reader than when I use the word encapsulation, I don't necessarily mean information hiding. Usually, I think in terms of contracts: Invariants, pre-, and postconditions.

With this design, the program is guaranteed to be present as a class field, since it's readonly and assigned upon initialisation. No defensive coding is required.

Remaining memory-management instructions #

While I wasn't planning on making use of the Devil's advocate technique, I did leave one little deliberate mistake in the above implementation: I'd hardcoded the data pointer as 0.

This made it easy to choose the next test case, and the next one after that, and so on.

At the two-hour mark, I had these test cases:

[Theory]
[InlineData("++++++++++++++++++++++++++++++++."" ")] // 32 increments; ASCII 32 is space
[InlineData("+++++++++++++++++++++++++++++++++.""!")] // 33 increments; ASCII 32 is !
[InlineData("+>++++++++++++++++++++++++++++++++."" ")] // 32 increments after >; ASCII 32 is space
[InlineData("+++++++++++++++++++++++++++++++++-."" ")] // 33 increments and 1 decrement; ASCII 32
[InlineData(">+<++++++++++++++++++++++++++++++++."" ")] // 32 increments after movement; ASCII 32
public void Run(string programstring expected)
{
    using var output = new StringWriter();
    var sut = new BrainfuckInterpreter(output);
 
    sut.Run(program);
    var actual = output.ToString();
 
    Assert.Equal(expected, actual);
}

And this implementation:

private sealed class InterpreterImp
{
    private int programPointer;
    private int dataPointer;
    private readonly byte[] data;
    private readonly string program;
    private readonly StringWriter output;
 
    internal InterpreterImp(string program, StringWriter output)
    {
        data = new byte[30_000];
        this.program = program;
        this.output = output;
    }
 
    internal void Run()
    {
        while (!IsDone)
            InterpretInstruction();
    }
 
    private bool IsDone => program.Length <= programPointer;
 
    private void InterpretInstruction()
    {
        var instruction = program[programPointer];
        switch (instruction)
        {
            case '>':
                dataPointer++;
                programPointer++;
                break;
            case '<':
                dataPointer--;
                programPointer++;
                break;
            case '+':
                data[dataPointer]++;
                programPointer++;
                break;
            case '-':
                data[dataPointer]--;
                programPointer++;
                break;
            case '.':
                output.Write((char)data[dataPointer]);
                programPointer++;
                break;
            default:
                programPointer++;
                break;
        }
    }
}

I'm only showing the inner InterpreterImp class, since I didn't change the outer BrainfuckInterpreter class.

At this point, I had used my two hours, but I think that I managed to leave my imaginary audience with a sketch of a possible solution.

Jumps #

What remained was the jumping instructions [ and ], as well as input.

Perhaps I could have kept adding small [InlineData] test cases to my single test method, but I thought I was ready to take on some of the small example programs on the Wikipedia page. I started with the addition example in this manner:

    // Copied from https://en.wikipedia.org/wiki/Brainfuck
    const string addTwoProgram = @"
++       Cell c0 = 2
> +++++  Cell c1 = 5
 
[        Start your loops with your cell pointer on the loop counter (c1 in our case)
< +      Add 1 to c0
> -      Subtract 1 from c1
]        End your loops with the cell pointer on the loop counter
 
At this point our program has added 5 to 2 leaving 7 in c0 and 0 in c1
but we cannot output this value to the terminal since it is not ASCII encoded
 
To display the ASCII character ""7"" we must add 48 to the value 7
We use a loop to compute 48 = 6 * 8
 
++++ ++++  c1 = 8 and this will be our loop counter again
[
< +++ +++  Add 6 to c0
> -        Subtract 1 from c1
]
< .        Print out c0 which has the value 55 which translates to ""7""!";
 
    [Fact]
    public void AddTwoValues()
    {
        using var input = new StringReader("");
        using var output = new StringWriter();
        var sut = new BrainfuckInterpreter(input, output);
 
        sut.Run(addTwoProgram);
        var actual = output.ToString();
 
        Assert.Equal("7", actual);
    }

I got that test passing, added the next example, got that passing, and so on. My final implementation looks like this:

public sealed class BrainfuckInterpreter
{
    private readonly TextReader input;
    private readonly TextWriter output;
 
    public BrainfuckInterpreter(TextReader input, TextWriter output)
    {
        this.input = input;
        this.output = output;
    }
 
    public void Run(string program)
    {
        var imp = new InterpreterImp(program, input, output);
        imp.Run();
    }
 
    private sealed class InterpreterImp
    {
        private int instructionPointer;
        private int dataPointer;
        private readonly byte[] data;
        private readonly string program;
        private readonly TextReader input;
        private readonly TextWriter output;
 
        internal InterpreterImp(string program, TextReader input, TextWriter output)
        {
            data = new byte[30_000];
            this.program = program;
            this.input = input;
            this.output = output;
        }
 
        internal void Run()
        {
            while (!IsDone)
                InterpretInstruction();
        }
 
        private bool IsDone => program.Length <= instructionPointer;
 
        private void InterpretInstruction()
        {
            WrapDataPointer();
 
            var instruction = program[instructionPointer];
            switch (instruction)
            {
                case '>':
                    dataPointer++;
                    instructionPointer++;
                    break;
                case '<':
                    dataPointer--;
                    instructionPointer++;
                    break;
                case '+':
                    data[dataPointer]++;
                    instructionPointer++;
                    break;
                case '-':
                    data[dataPointer]--;
                    instructionPointer++;
                    break;
                case '.':
                    output.Write((char)data[dataPointer]);
                    instructionPointer++;
                    break;
                case ',':
                    data[dataPointer] = (byte)input.Read();
                    instructionPointer++;
                    break;
                case '[':
                    if (data[dataPointer] == 0)
                        MoveToMatchingClose();
                    else
                        instructionPointer++;
                    break;
                case ']':
                    if (data[dataPointer] != 0)
                        MoveToMatchingOpen();
                    else
                        instructionPointer++;
                    break;
                default:
                    instructionPointer++;
                    break;
            }
        }
 
        private void WrapDataPointer()
        {
            if (dataPointer == -1)
                dataPointer = data.Length - 1;
            if (dataPointer == data.Length)
                dataPointer = 0;
        }
 
        private void MoveToMatchingClose()
        {
            var nestingLevel = 1;
            while (0 < nestingLevel)
            {
                instructionPointer++;
                if (program[instructionPointer] == '[')
                    nestingLevel++;
                if (program[instructionPointer] == ']')
                    nestingLevel--;
            }
            instructionPointer++;
        }
 
        private void MoveToMatchingOpen()
        {
            var nestingLevel = 1;
            while (0 < nestingLevel)
            {
                instructionPointer--;
                if (program[instructionPointer] == ']')
                    nestingLevel++;
                if (program[instructionPointer] == '[')
                    nestingLevel--;
            }
            instructionPointer++;
        }
    }
}

As you can see, I finally discovered that I'd been too concrete when using StringWriter. Now, input is defined as a TextReader, and output as a TextWriter.

When TextReader.Read encounters the end of the input stream, it returns -1, and when you cast that to byte, it becomes 255. I admit that I haven't read through the Wikipedia article's ROT13 example code to a degree that I understand how it decides to stop processing, but the test passes.

I also realised that the Wikipedia article used the term instruction pointer, so I renamed programPointer to instructionPointer.

Assessment #

Due to the switch/case structure, the InterpretInstruction method has a cyclomatic complexity of 12, which is more than I recommend in my book Code That Fits in Your Head.

It's not uncommon that switch/case code has high cyclomatic complexity, and this is also a common criticism of the measure. When each case block is as simple as it is here, or delegates to helper methods such as MoveToMatchingClose, you could reasonably argue that the code is still maintainable.

Refactoring lists switch statements as a code smell and suggests better alternatives. Had I followed the kata description's additional constraints to the letter, I should also have made it easy to add new instructions, or rename existing ones. This might suggest that one of Martin Fowler's refactorings might be in order.

That is, however, an entirely different kind of exercise, and I thought that I'd already gotten what I wanted out of the kata.

Conclusion #

At first glance, the Brainfuck language isn't difficult to understand (but onerous to read). Even so, it took me so long time to understand the example code that I almost gave up more than once. Still, once I understood how it worked, the interpreter actually wasn't that hard to write.

In retrospect, perhaps I should have structured my code differently. Perhaps I should have used polymorphism instead of a switch statement. Perhaps I should have written the code in a more functional style. Regular readers will at least recognise that the code shown here is uncharacteristically imperative for me. I do, however, try to vary my approach to fit the problem at hand (use the right tool for the job, as the old saw goes), and the Brainfuck language is described in so imperative terms that imperative code seemed like the most fitting style.

Now that I understand how Brainfuck works, I might later try to do the kata with some other constraints. It might prove interesting.


Decomposing CTFiYH's sample code base

Monday, 04 September 2023 06:00:00 UTC

An experience report.

In my book Code That Fits in Your Head (CTFiYH) I write in the last chapter:

If you've looked at the book's sample code base, you may have noticed that it looks disconcertingly monolithic. If you consider the full code base that includes the integration tests, as [the following figure] illustrates, there are all of three packages[...]. Of those, only one is production code.

Three boxes labelled unit tests, integration tests, and REST API.

[Figure caption:] The packages that make up the sample code base. With only a single production package, it reeks of a monolith.

Later, after discussing dependency cycles, I conclude:

I've been writing F# and Haskell for enough years that I naturally follow the beneficial rules that they enforce. I'm confident that the sample code is nicely decoupled, even though it's packaged as a monolith. But unless you have a similar experience, I recommend that you separate your code base into multiple packages.

Usually, you can't write something that cocksure without it backfiring, but I was really, really confident that this was the case. Still, it's always nagged me, because I believe that I should walk the walk rather than talk the talk. And I do admit that this was one of the few claims in the book that I actually didn't have code to back up.

So I decided to spend part of a weekend to verify that what I wrote was true. You won't believe what happened next.

Decomposition #

Reader, I was right all along.

I stopped my experiment when my package graph looked like this:

Ports-and-adapters architecture diagram.

Does that look familiar? It should; it's a poster example of Ports and Adapters, or, if you will, Clean Architecture. Notice how all dependencies point inward, following the Dependency Inversion Principle.

The Domain Model has no dependencies, while the HTTP Model (Application Layer) only depends on the Domain Model. The outer layer contains the ports and adapters, as well as the Composition Root. The Web Host is a small web project that composes everything else. In order to do that, it must reference everything else, either directly (SMTP, SQL, HTTP Model) or transitively (Domain Model).

The Adapters, on the other hand, depend on the HTTP Model and (not shown) the SDKs that they adapt. The SqlReservationsRepository class, for example, is implemented in the SQL library, adapting the System.Data.SqlClient SDK to look like an IReservationsRepository, which is defined in the HTTP Model.

The SMTP library is similar. It contains a concrete implementation called SmtpPostOffice that adapts the System.Net.Mail API to look like an IPostOffice object. Once again, the IPostOffice interface is defined in the HTTP Model.

The above figure is not to scale. In reality, the outer ring is quite thin. The SQL library contains only SqlReservationsRepository and some supporting text files with SQL DDL definitions. The SMTP library contains only the SmtpPostOffice class. And the Web Host contains Program, Startup, and a few configuration file DTOs (options, in ASP.NET parlance).

Application layer #

The majority of code, at least if I use a rough proxy measure like number of files, is in the HTTP Model. I often think of this as the application layer, because it's all the logic that's specific to to the application, in contrast to the Domain Model, which ought to contain code that can be used in a variety of application contexts (REST API, web site, batch job, etc.).

In this particular, case the application is a REST API, and it turns out that while the Domain Model isn't trivial, more goes on making sure that the REST API behaves correctly: That it returns correctly formatted data, that it validates input, that it detects attempts at tampering with URLs, that it redirects legacy URLs, etc.

This layer also contains the interfaces for the application's real dependencies: IReservationsRepository, IPostOffice, IRestaurantDatabase, and IClock. This explains why the SQL and SMTP packages need to reference the HTTP Model.

If you have bought the book, you have access to its example code base, and while it's a Git repository, this particular experiment isn't included. After all, I just made it, two years after finishing the book. Thus, if you want to compare with the code that comes with the book, here's a list of all the files I moved to the HTTP Model package:

  • AccessControlList.cs
  • CalendarController.cs
  • CalendarDto.cs
  • Day.cs
  • DayDto.cs
  • DtoConversions.cs
  • EmailingReservationsRepository.cs
  • Grandfather.cs
  • HomeController.cs
  • HomeDto.cs
  • Hypertext.cs
  • IClock.cs
  • InMemoryRestaurantDatabase.cs
  • IPeriod.cs
  • IPeriodVisitor.cs
  • IPostOffice.cs
  • IReservationsRepository.cs
  • IRestaurantDatabase.cs
  • Iso8601.cs
  • LinkDto.cs
  • LinksFilter.cs
  • LoggingClock.cs
  • LoggingPostOffice.cs
  • LoggingReservationsRepository.cs
  • Month.cs
  • NullPostOffice.cs
  • Period.cs
  • ReservationDto.cs
  • ReservationsController.cs
  • ReservationsRepository.cs
  • RestaurantDto.cs
  • RestaurantsController.cs
  • ScheduleController.cs
  • SigningUrlHelper.cs
  • SigningUrlHelperFactory.cs
  • SystemClock.cs
  • TimeDto.cs
  • UrlBuilder.cs
  • UrlIntegrityFilter.cs
  • Year.cs

As you can see, this package contains the Controllers, the DTOs, the interfaces, and some REST- and HTTP-specific code such as SigningUrlHelper, UrlIntegrityFilter, LinksFilter, security, ISO 8601 formatters, etc.

Domain Model #

The Domain Model is small, but not insignificant. Perhaps the most striking quality of it is that (with a single, inconsequential exception) it contains no interfaces. There are no polymorphic types that model application dependencies such as databases, web services, messaging systems, or the system clock. Those are all the purview of the application layer.

As the book describes, the architecture is functional core, imperative shell, and since dependencies make everything impure, you can't have those in your functional core. While, with a language like C#, you can never be sure that a function truly is pure, I believe that the entire Domain Model is referentially transparent.

For those readers who have the book's sample code base, here's a list of the files I moved to the Domain Model:

  • Email.cs
  • ITableVisitor.cs
  • MaitreD.cs
  • Name.cs
  • Reservation.cs
  • ReservationsVisitor.cs
  • Restaurant.cs
  • Seating.cs
  • Table.cs
  • TimeOfDay.cs
  • TimeSlot.cs

If the entire Domain Model consists of immutable values and pure functions, and if impure dependencies make everything impure, what's the ITableVisitor interface doing there?

This interface doesn't model any external application dependency, but rather represents a sum type with the Visitor pattern. The interface looks like this:

public interface ITableVisitor<T>
{
    T VisitStandard(int seats, Reservation? reservation);
    T VisitCommunal(int seats, IReadOnlyCollection<Reservation> reservations);
}

Restaurant tables are modelled this way because the Domain Model distinguishes between two fundamentally different kinds of tables: Normal restaurant tables, and communal or shared tables. In F# or Haskell such a sum type would be a one-liner, but in C# you need to use either the Visitor pattern or a Church encoding. For the book, I chose the Visitor pattern in order to keep the code base as object-oriented as possible.

Circular dependencies #

In the book I wrote:

The passive prevention of cycles [that comes from separate packages] is worth the extra complexity. Unless team members have extensive experience with a language that prevents cycles, I recommend this style of architecture.

Such languages do exist, though. F# famously prevents cycles. In it, you can't use a piece of code unless it's already defined above. Newcomers to the language see this as a terrible flaw, but it's actually one of its best features.

Haskell takes a different approach, but ultimately, its explicit treatment of side effects at the type level steers you towards a ports-and-adapters-style architecture. Your code simply doesn't compile otherwise!

I've been writing F# and Haskell for enough years that I naturally follow the beneficial rules that they enforce. I'm confident that the sample code is nicely decoupled, even though it's packaged as a monolith. But unless you have a similar experience, I recommend that you separate your code base into multiple packages.

As so far demonstrated in this article, I'm now sufficiently conditioned to be aware of side effects and non-determinism that I know to avoid them and push them to be boundaries of the application. Even so, it turns out that it's insidiously easy to introduce small cycles when the language doesn't stop you.

This wasn't much of a problem in the Domain Model, but one small example may still illustrate how easy it is to let your guard down. In the Domain Model, I'd added a class called TimeOfDay (since this code base predates TimeOnly), but without thinking much of it, I'd added this method:

public string ToIso8601TimeString()
{
    return durationSinceMidnight.ToIso8601TimeString();
}

While this doesn't look like much, formatting a time value as an ISO 8601 value isn't a Domain concern. It's an application boundary concern, so belongs in the HTTP Model. And sure enough, once I moved the file that contained the ISO 8601 conversions to the HTTP Model, the TimeOfDay class no longer compiled.

In this case, the fix was easy. I removed the method from the TimeOfDay class, but added an extension method to the other ISO 8601 conversions:

public static string ToIso8601TimeString(this TimeOfDay timeOfDay)
{
    return ((TimeSpan)timeOfDay).ToIso8601TimeString();
}

While I had little trouble moving the files to the Domain Model one-by-one, once I started moving files to the HTTP Model, it turned out that this part of the code base contained more coupling.

Since I had made many classes and interfaces internal by default, once I started moving types to the HTTP Model, I was faced with either making them public, or move them en block. Ultimately, I decided to move eighteen files that were transitively linked to each other in one go. I could have moved them in smaller chunks, but that would have made the internal types invisible to the code that (temporarily) stayed behind. I decided to move them all at once. After all, while I prefer to move in small, controlled steps, even moving eighteen files isn't that big an operation.

In the end, I still had to make LinksFilter, UrlIntegrityFilter, SigningUrlHelperFactory, and AccessControlList.FromUser public, because I needed to reference them from the Web Host.

Test dependencies #

You may have noticed that in the above diagram, it doesn't look as though I separated the two test packages into more packages, and that is, indeed, the case. I've recently described how I think about distinguishing kinds of tests, and I don't really care too much whether an automated test exercises only a single function, or a whole bundle of objects. What I do care about is whether a test is simple or complex, fast or slow. That kind of thing.

The package labelled "Integration tests" on the diagram is really a small test library that exercises some SQL Server-specific behaviour. Some of the tests in it verify that certain race conditions don't occur. They do that by keep trying to make the race condition occur, until they time out. Since the timeout is 30 seconds per test, this test suite is slow. That's the reason it's a separate library, even though it contains only eight tests. The book contains more details.

The "Unit tests" package contains the bulk of the tests: 176 tests, some of which are FsCheck properties that each run a hundred test cases.

Some tests are self-hosted integration tests that rely on the Web Host, and some of them are more 'traditional' unit tests. Dependencies are transitive, so I drew an arrow from the "Unit tests" package to the Web Host. Some unit tests exercise objects in the HTTP Model, and some exercise the Domain Model.

You may have another question: If the Integration tests reference the SQL package, then why not the SMTP package? Why is it okay that the unit tests reference the SMTP package?

Again, I want to reiterate that the reason I distinguished between these two test packages were related to execution speed rather than architecture. The few SMTP tests are fast enough, so there's no reason to keep them in a separate package.

In fact, the SMTP tests don't exercise that the SmtpPostOffice can send email. Rather, I treat that class as a Humble Object. The few tests that I do have only verify that the system can parse configuration settings:

[Theory]
[InlineData("m.example.net", 587, "grault""garply""g@example.org")]
[InlineData("n.example.net", 465, "corge""waldo""c@example.org")]
public void ToPostOfficeReturnsSmtpOffice(
    string host,
    int port,
    string userName,
    string password,
    string from)
{
    var sut = Create.SmtpOptions(host, port, userName, password, from);
 
    var db = new InMemoryRestaurantDatabase();
    var actual = sut.ToPostOffice(db);
 
    var expected = new SmtpPostOffice(host, port, userName, password, from, db);
    Assert.Equal(expected, actual);
}

Notice, by the way, the use of structural equality on a service. Consider doing that more often.

In any case, the separation of automated tests into two packages may not be the final iteration. It's worked well so far, in this context, but it's possible that had things been different, I would have chosen to have more test packages.

Conclusion #

In the book, I made a bold claim: Although I had developed the example code as a monolith, I asserted that I'd been so careful that I could easily tease it apart into multiple packages should I chose to do so.

This sounded so much like hubris that I was trepidatious writing it. I wrote it anyway, because, while I hadn't tried, I was convinced that I was right.

Still, it nagged me ever since. What if, after all, I was wrong? I've been wrong before.

So I decided to finally make the experiment, and to my relief it turned out that I'd been right.

Don't try this at home, kids.


A first crack at the Args kata

Monday, 28 August 2023 07:28:00 UTC

Test-driven development in C#.

A customer hired me to swing by to demonstrate test-driven development and tactical Git. To make things interesting, we agreed that they'd give me a kata at the beginning of the session. I didn't know which problem they'd give me, so I thought it'd be a good idea to come prepared. I decided to seek out katas that I hadn't done before.

The demonstration session was supposed to be two hours in front of a participating audience. In order to make my preparation aligned to that situation, I decided to impose a two-hour time limit to see how far I could get. At the same time, I'd also keep an eye on didactics, so preferably proceeding in an order that would be explainable to an audience.

Some katas are more complicated than others, so I'm under no illusion that I can complete any, to me unknown, kata in under two hours. My success criterion for the time limit is that I'd like to reach a point that would satisfy an audience. Even if, after two hours, I don't reach a complete solution, I should leave a creative and intelligent audience with a good idea of how to proceed.

The first kata I decided to try was the Args kata. In this article, I'll describe some of the most interesting things that happened along the way. If you want all the details, the code is available on GitHub.

Boolean parser #

In short, the goal of the Args kata is to develop an API for parsing command-line arguments.

When you encounter a new problem, it's common to have a few false starts until you develop a promising plan. This happened to me as well, but after a few attempts that I quickly stashed, I realised that this is really a validation problem - as in parse, don't validate.

The first thing I did after that realisation was to copy verbatim the Validated code from An applicative reservation validation example in C#. I consider it fair game to reuse general-purpose code like this for a kata.

With that basic building block available, I decided to start with a parser that would handle Boolean flags. My reasoning was that this might be the simplest parser, since it doesn't have many failure modes. If the flag is present, the value should be interpreted to be true; otherwise, false.

Over a series of iterations, I developed this parametrised xUnit.net test:

[Theory]
[InlineData('l'"-l"true)]
[InlineData('l'" -l "true)]
[InlineData('l'"-l -p 8080 -d /usr/logs"true)]
[InlineData('l'"-p 8080 -l -d /usr/logs"true)]
[InlineData('l'"-p 8080 -d /usr/logs"false)]
[InlineData('l'"-l true"true)]
[InlineData('l'"-l false"false)]
[InlineData('l'"nonsense"false)]
[InlineData('f'"-f"true)]
[InlineData('f'"foo"false)]
[InlineData('f'""false)]
public void TryParseSuccess(char flagNamestring candidatebool expected)
{
    var sut = new BoolParser(flagName);
    var actual = sut.TryParse(candidate);
    Assert.Equal(Validated.Succeed<stringbool>(expected), actual);
}

To be clear, this test started as a [Fact] (single, non-parametrised test) that I subsequently converted to a parametrised test, and then added more and more test cases to.

The final implementation of BoolParser looks like this:

public sealed class BoolParser : IParser<bool>
{
    private readonly char flagName;
 
    public BoolParser(char flagName)
    {
        this.flagName = flagName;
    }
 
    public Validated<stringboolTryParse(string candidate)
    {
        var idx = candidate.IndexOf($"-{flagName}");
        if (idx < 0)
            return Validated.Succeed<stringbool>(false);
 
        var nextFlagIdx = candidate[(idx + 2)..].IndexOf('-');
        var bFlag = nextFlagIdx < 0
            ? candidate[(idx + 2)..]
            : candidate.Substring(idx + 2, nextFlagIdx);
        if (bool.TryParse(bFlag, out var b))
            return Validated.Succeed<stringbool>(b);
 
        return Validated.Succeed<stringbool>(true);
    }
}

This may not be the most elegant solution, but it passes all tests. Since I was under time pressure, I didn't want to spend too much time polishing the implementation details. As longs as I'm comfortable with the API design and the test cases, I can always refactor later. (I usually say that later is never, which also turned out to be true this time. On the other hand, it's not that the implementation code is awful in any way. It has a cyclomatic complexity of 4 and fits within a 80 x 20 box. It could be much worse.)

The IParser interface came afterwards. It wasn't driven by the above test, but by later developments.

Rough proof of concept #

Once I had a passable implementation of BoolParser, I developed a similar IntParser to a degree where it supported a happy path. With two parsers, I had enough building blocks to demonstrate how to combine them. At that point, I also had some 40 minutes left, so it was time to produce something that might look useful.

At first, I wanted to demonstrate that it's possible to combine the two parsers, so I wrote this test:

[Fact]
public void ParseBoolAndIntProofOfConceptRaw()
{
    var args = "-l -p 8080";
    var l = new BoolParser('l').TryParse(args).SelectFailure(s => new[] { s });
    var p = new IntParser('p').TryParse(args).SelectFailure(s => new[] { s });
    Func<boolint, (boolint)> createTuple = (bi) => (b, i);
    static string[] combineErrors(string[] s1string[] s2) => s1.Concat(s2).ToArray();
 
    var actual = createTuple.Apply(l, combineErrors).Apply(p, combineErrors);
 
    Assert.Equal(Validated.Succeed<string[], (boolint)>((true, 8080)), actual);
}

That's really not pretty, and I wouldn't expect an unsuspecting audience to understand what's going on. It doesn't help that C# is inadequate for applicative functors. While it's possible to implement applicative validation, the C# API is awkward. (There are ways to make it better than what's on display here, but keep in mind that I came into this exercise unprepared, and had to grab what was closest at hand.)

The main point of the above test was only to demonstrate that it's possible to combine two parsers into one. That took me roughly 15 minutes.

Armed with that knowledge, I then proceeded to define this base class:

public abstract class ArgsParser<T1T2T>
{
    private readonly IParser<T1> parser1;
    private readonly IParser<T2> parser2;
 
    public ArgsParser(IParser<T1> parser1, IParser<T2> parser2)
    {
        this.parser1 = parser1;
        this.parser2 = parser2;
    }
 
    public Validated<string[], T> TryParse(string candidate)
    {
        var l = parser1.TryParse(candidate).SelectFailure(s => new[] { s });
        var p = parser2.TryParse(candidate).SelectFailure(s => new[] { s });
 
        Func<T1, T2, T> create = Create;
        return create.Apply(l, CombineErrors).Apply(p, CombineErrors);
    }
 
    protected abstract T Create(T1 x1, T2 x2);
 
    private static string[] CombineErrors(string[] s1string[] s2)
    {
        return s1.Concat(s2).ToArray();
    }
}

While I'm not a fan of inheritance, this seemed the fasted way to expand on the proof of concept. The class encapsulates the ugly details of the ParseBoolAndIntProofOfConceptRaw test, while leaving just enough room for a derived class:

internal sealed class ProofOfConceptParser : ArgsParser<boolint, (boolint)>
{
    public ProofOfConceptParser() : base(new BoolParser('l'), new IntParser('p'))
    {
    }
 
    protected override (boolintCreate(bool x1int x2)
    {
        return (x1, x2);
    }
}

This class only defines which parsers to use and how to translate successful results to a single object. Here, because this is still a proof of concept, the resulting object is just a tuple.

The corresponding test looks like this:

[Fact]
public void ParseBoolAndIntProofOfConcept()
{
    var sut = new ProofOfConceptParser();
 
    var actual = sut.TryParse("-l -p 8080");
 
    Assert.Equal(Validated.Succeed<string[], (boolint)>((true, 8080)), actual);
}

At this point, I hit the two-hour mark, but I think I managed to produce enough code to convince a hypothetical audience that a complete solution is within grasp.

What remained was to

  • add proper error handling to IntParser
  • add a corresponding StringParser
  • improve the ArgsParser API
  • add better demo examples of the improved ArgsParser API

While I could leave this as an exercise to the reader, I couldn't just leave the code like that.

Finishing the kata #

For my own satisfaction, I decided to complete the kata, which I did in another hour.

Although I had started with an abstract base class, I know how to refactor it to a sealed class with an injected Strategy. I did that for the existing class, and also added one that supports three parsers instead of two:

public sealed class ArgsParser<T1T2T3T>
{
    private readonly IParser<T1> parser1;
    private readonly IParser<T2> parser2;
    private readonly IParser<T3> parser3;
    private readonly Func<T1, T2, T3, T> create;
 
    public ArgsParser(
        IParser<T1> parser1,
        IParser<T2> parser2,
        IParser<T3> parser3,
        Func<T1, T2, T3, T> create)
    {
        this.parser1 = parser1;
        this.parser2 = parser2;
        this.parser3 = parser3;
        this.create = create;
    }
 
    public Validated<string[], T> TryParse(string candidate)
    {
        var x1 = parser1.TryParse(candidate).SelectFailure(s => new[] { s });
        var x2 = parser2.TryParse(candidate).SelectFailure(s => new[] { s });
        var x3 = parser3.TryParse(candidate).SelectFailure(s => new[] { s });
        return create
            .Apply(x1, CombineErrors)
            .Apply(x2, CombineErrors)
            .Apply(x3, CombineErrors);
    }
 
    private static string[] CombineErrors(string[] s1string[] s2)
    {
        return s1.Concat(s2).ToArray();
    }
}

Granted, that's a bit of boilerplate, but if you imagine this as supplied by a reusable library, you only have to write this once.

I was now ready to parse the kata's central example, "-l -p 8080 -d /usr/logs", to a strongly typed value:

private sealed record TestConfig(bool DoLogint Portstring Directory);
 
[Theory]
[InlineData("-l -p 8080 -d /usr/logs")]
[InlineData("-p 8080 -l -d /usr/logs")]
[InlineData("-d /usr/logs -l -p 8080")]
[InlineData(" -d  /usr/logs  -l  -p 8080  ")]
public void ParseConfig(string args)
{
    var sut = new ArgsParser<boolintstring, TestConfig>(
        new BoolParser('l'),
        new IntParser('p'),
        new StringParser('d'),
        (bis) => new TestConfig(b, i, s));
 
    var actual = sut.TryParse(args);
 
    Assert.Equal(
        Validated.Succeed<string[], TestConfig>(
            new TestConfig(true, 8080, "/usr/logs")),
        actual);
}

This test parses some variations of the example input into an immutable record.

What happens if the input is malformed? Here's an example of that:

[Fact]
public void FailToParseConfig()
{
    var sut = new ArgsParser<boolintstring, TestConfig>(
        new BoolParser('l'),
        new IntParser('p'),
        new StringParser('d'),
        (bis) => new TestConfig(b, i, s));
 
    var actual = sut.TryParse("-p aityaity");
 
    Assert.True(actual.Match(
        onFailure: ss => ss.Contains("Expected integer for flag '-p', but got \"aityaity\"."),
        onSuccess: _ => false));
    Assert.True(actual.Match(
        onFailure: ss => ss.Contains("Missing value for flag '-d'."),
        onSuccess: _ => false));
}

Of particular interest is that, as promised by applicative validation, parsing failures don't short-circuit. The input value "-p aityaity" has two problems, and both are reported by TryParse.

At this point I was happy that I had sufficiently demonstrated the viability of the design. I decided to call it a day.

Conclusion #

As I did the Args kata, I found it interesting enough to warrant an article. Once I realised that I could use applicative parsing as the basis for the API, the rest followed.

There's room for improvement, but while doing katas is valuable, there are marginal returns in perfecting the code. Get the main functionality working, learn from it, and move on to another exercise.


Compile-time type-checked truth tables

Monday, 21 August 2023 08:07:00 UTC

With simple and easy-to-understand examples in F# and Haskell.

Eve Ragins recently published an article called Why you should use truth tables in your job. It's a good article. You should read it.

In it, she outlines how creating a Truth Table can help you smoke out edge cases or unclear requirements.

I agree, and it also beautifully explains why I find algebraic data types so useful.

With languages like F# or Haskell, this kind of modelling is part of the language, and you even get statically-typed compile-time checking that tells you whether you've handled all combinations.

Eve Ragins points out that there are other, socio-technical benefits from drawing up a truth table that you can, perhaps, print out, or otherwise share with non-technical stakeholders. Thus, the following is in no way meant as a full replacement, but rather as examples of how certain languages have affordances that enable you to think like this while programming.

F# #

I'm not going to go through Eve Ragins' blow-by-blow walkthrough, explaining how you construct a truth table. Rather, I'm just briefly going to show how simple it is to do the same in F#.

Most of the inputs in her example are Boolean values, which already exist in the language, but we need a type for the item status:

type ItemStatus = NotAvailable | Available | InUse

As is typical in F#, a type declaration is just a one-liner.

Now for something a little more interesting. In Eve Ragins' final table, there's a footnote that says that the dash/minus symbol indicates that the value is irrelevant. If you look a little closer, it turns out that the should_field_be_editable value is irrelevant whenever the should_field_show value is FALSE.

So instead of a bool * bool tuple, you really have a three-state type like this:

type FieldState = Hidden | ReadOnly | ReadWrite

It would probably have taken a few iterations to learn this if you'd jumped straight into pattern matching in F#, but since F# requires you to define types and functions before you can use them, I list the type now.

That's all you need to produce a truth table in F#:

let decide requiresApproval canUserApprove itemStatus =
    match requiresApproval, canUserApprove, itemStatus with
    |  true,  true, NotAvailable -> Hidden
    | false,  true, NotAvailable -> Hidden
    |  truefalse, NotAvailable -> Hidden
    | falsefalse, NotAvailable -> Hidden
    |  true,  true,    Available -> ReadWrite
    | false,  true,    Available -> Hidden
    |  truefalse,    Available -> ReadOnly
    | falsefalse,    Available -> Hidden
    |  true,  true,        InUse -> ReadOnly
    | false,  true,        InUse -> Hidden
    |  truefalse,        InUse -> ReadOnly
    | falsefalse,        InUse -> Hidden

I've called the function decide because it wasn't clear to me what else to call it.

What's so nice about F# pattern matching is that the compiler can tell if you've missed a combination. If you forget a combination, you get a helpful Incomplete pattern match compiler warning that points out the combination that you missed.

And as I argue in my book Code That Fits in Your Head, you should turn warnings into errors. This would also be helpful in a case like this, since you'd be prevented from forgetting an edge case.

Haskell #

You can do the same exercise in Haskell, and the result is strikingly similar:

data ItemStatus = NotAvailable | Available | InUse deriving (EqShow)
 
data FieldState = Hidden | ReadOnly | ReadWrite deriving (EqShow)
 
decide :: Bool -> Bool -> ItemStatus -> FieldState
decide  True  True NotAvailable = Hidden
decide False  True NotAvailable = Hidden
decide  True False NotAvailable = Hidden
decide False False NotAvailable = Hidden
decide  True  True    Available = ReadWrite
decide False  True    Available = Hidden
decide  True False    Available = ReadOnly
decide False False    Available = Hidden
decide  True  True        InUse = ReadOnly
decide False  True        InUse = Hidden
decide  True False        InUse = ReadOnly
decide False False        InUse = Hidden

Just like in F#, if you forget a combination, the compiler will tell you:

LibrarySystem.hs:8:1: warning: [-Wincomplete-patterns]
    Pattern match(es) are non-exhaustive
    In an equation for `decide':
        Patterns of type `Bool', `Bool', `ItemStatus' not matched:
            False False NotAvailable
  |
8 | decide  True  True NotAvailable = Hidden
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...

To be clear, that combination is not missing from the above code example. This compiler warning was one I subsequently caused by commenting out a line.

It's also possible to turn warnings into errors in Haskell.

Conclusion #

I love languages with algebraic data types because they don't just enable modelling like this, they encourage it. This makes it much easier to write code that handles various special cases that I'd easily overlook in other languages. In languages like F# and Haskell, the compiler will tell you if you forgot to deal with a combination.


Replacing Mock and Stub with a Fake

Monday, 14 August 2023 07:23:00 UTC

A simple C# example.

A reader recently wrote me about my 2013 article Mocks for Commands, Stubs for Queries, commenting that the 'final' code looks suspect. Since it looks like the following, that's hardly an overstatement.

public User GetUser(int userId)
{
    var u = this.userRepository.Read(userId);
    if (u.Id == 0)
        this.userRepository.Create(1234);
    return u;
}

Can you spot what's wrong?

Missing test cases #

You might point out that this example seems to violate Command Query Separation, and probably other design principles as well. I agree that the example is a bit odd, but that's not what I have in mind.

The problem with the above example is that while it correctly calls the Read method with the userId parameter, it calls Create with the hardcoded constant 1234. It really ought to call Create with userId.

Does this mean that the technique that I described in 2013 is wrong? I don't think so. Rather, I left the code in a rather unhelpful state. What I had in mind with that article was the technique I called data flow verification. As soon as I had delivered that message, I was, according to my own goals, done. I wrapped up the article, leaving the code as shown above.

As the reader remarked, it's noteworthy that an article about better unit testing leaves the System Under Test (SUT) in an obviously defect state.

The short response is that at least one test case is missing. Since this was only demo code to show an example, the entire test suite is this:

public class SomeControllerTests
{
    [Theory]
    [InlineData(1234)]
    [InlineData(9876)]
    public void GetUserReturnsCorrectValue(int userId)
    {
        var expected = new User();
        var td = new Mock<IUserRepository>();
        td.Setup(r => r.Read(userId)).Returns(expected);
        var sut = new SomeController(td.Object);
 
        var actual = sut.GetUser(userId);
 
        Assert.Equal(expected, actual);
    }
 
    [Fact]
    public void UserIsSavedIfItDoesNotExist()
    {
        var td = new Mock<IUserRepository>();
        td.Setup(r => r.Read(1234)).Returns(new User { Id = 0 });
        var sut = new SomeController(td.Object);
 
        sut.GetUser(1234);
 
        td.Verify(r => r.Create(1234));
    }
}

There are three test cases: Two for the parametrised GetUserReturnsCorrectValue method and one test case for the UserIsSavedIfItDoesNotExist test. Since the latter only verifies the hardcoded value 1234 the Devil's advocate can get by with using that hardcoded value as well.

Adding a test case #

The solution to that problem is simple enough. Add another test case by converting UserIsSavedIfItDoesNotExist to a parametrised test:

[Theory]
[InlineData(1234)]
[InlineData(9876)]
public void UserIsSavedIfItDoesNotExist(int userId)
{
    var td = new Mock<IUserRepository>();
    td.Setup(r => r.Read(userId)).Returns(new User { Id = 0 });
    var sut = new SomeController(td.Object);
 
    sut.GetUser(userId);
 
    td.Verify(r => r.Create(userId));
}

There's no reason to edit the other test method; this should be enough to elicit a change to the SUT:

public User GetUser(int userId)
{
    var u = this.userRepository.Read(userId);
    if (u.Id == 0)
        this.userRepository.Create(userId);
    return u;
}

When you use Mocks (or, rather, Spies) and Stubs the Data Flow Verification technique is useful.

On the other hand, I no longer use Spies or Stubs since they tend to break encapsulation.

Fake #

These days, I tend to only model real application dependencies as Test Doubles, and when I do, I use Fakes.

Dos Equis meme with the text: I don't always use Test Doubles, but when I do, I use Fakes.

While the article series From interaction-based to state-based testing goes into more details, I think that this small example is a good opportunity to demonstrate the technique.

The IUserRepository interface is defined like this:

public interface IUserRepository
{
    User Read(int userId);
 
    void Create(int userId);
}

A typical Fake is an in-memory collection:

public sealed class FakeUserRepository : Collection<User>, IUserRepository
{
    public void Create(int userId)
    {
        Add(new User { Id = userId });
    }
 
    public User Read(int userId)
    {
        var user = this.SingleOrDefault(u => u.Id == userId);
        if (user == null)
            return new User { Id = 0 };
        return user;
    }
}

In my experience, they're typically easy to implement by inheriting from a collection base class. Such an object exhibits typical traits of a Fake object: It fulfils the implied contract, but it lacks some of the 'ilities'.

The contract of a Repository is typically that if you add an Entity, you'd expect to be able to retrieve it later. If the Repository offers a Delete method (this one doesn't), you'd expect the deleted Entity to be gone, so that you can't retrieve it. And so on. The FakeUserRepository class fulfils such a contract.

On the other hand, you'd also expect a proper Repository implementation to support more than that:

  • You'd expect a proper implementation to persist data so that you can reboot or change computers without losing data.
  • You'd expect a proper implementation to correctly handle multiple threads.
  • You may expect a proper implementation to support ACID transactions.

The FakeUserRepository does none of that, but in the context of a unit test, it doesn't matter. The data exists as long as the object exists, and that's until it goes out of scope. As long as a test needs the Repository, it remains in scope, and the data is there.

Likewise, each test runs in a single thread. Even when tests run in parallel, each test has its own Fake object, so there's no shared state. Therefore, even though FakeUserRepository isn't thread-safe, it doesn't have to be.

Testing with the Fake #

You can now rewrite the tests to use FakeUserRepository:

[Theory]
[InlineData(1234)]
[InlineData(9876)]
public void GetUserReturnsCorrectValue(int userId)
{
    var expected = new User { Id = userId };
    var db = new FakeUserRepository { expected };
    var sut = new SomeController(db);
 
    var actual = sut.GetUser(userId);
 
    Assert.Equal(expected, actual);
}
 
[Theory]
[InlineData(1234)]
[InlineData(9876)]
public void UserIsSavedIfItDoesNotExist(int userId)
{
    var db = new FakeUserRepository();
    var sut = new SomeController(db);
 
    sut.GetUser(userId);
 
    Assert.Single(db, u => u.Id == userId);
}

Instead of asking a Spy whether or not a particular method was called (which is an implementation detail), the UserIsSavedIfItDoesNotExist test verifies the posterior state of the database.

Conclusion #

In my experience, using Fakes simplifies unit tests. While you may have to edit the Fake implementation from time to time, you edit that code in a single place. The alternative is to edit all affected tests, every time you change something about a dependency. This is also known as Shotgun Surgery and considered an antipattern.

The code base that accompanies my book Code That Fits in Your Head has more realistic examples of this technique, and much else.


Comments

Hi Mark,

Firstly, thank you for another insightful article.

I'm curious about using Fakes and testing exceptions. In scenarios where dynamic mocks (like Moq) are employed, we can mock a method to throw an exception, allowing us to test the expected behavior of the System Under Test (SUT). In your example, if we were using Moq, we could create a test to mock the UserRepository's Read method to throw a specific exception (e.g., SqlException). This way, we could ensure that the controller responds appropriately, perhaps with an internal server response. However, I'm unsure about how to achieve a similar test using Fakes. Is this type of test suitable for Fakes, or do such tests not align with the intended use of Fakes? Personally, I avoid using try-catch blocks in repositories or controllers and prefer handling exceptions in middleware (e.g., ErrorHandler). In such cases, I write separate unit tests for the middleware. Could this be a more fitting approach? Your guidance would be much appreciated.

(And yes, I remember your advice about framing questions —it's in your 'Code that Fits in Your Head' book! :D )

Thanks

2024-01-15 03:10 UTC

Thank you for writing. That's a question that warrants an article or two. I've now published an article titled Error categories and category errors. It's not a direct answer to your question, but I found it useful to first outline my thinking on errors in general.

I'll post an update here when I also have an answer to your specific question.

2024-01-30 7:13 UTC

AmirB, once again, thank you for writing. I've now published an article titled Testing exceptions that attempt to answer your question.

2024-02-26 6:57 UTC

NonEmpty catamorphism

Monday, 07 August 2023 11:40:00 UTC

The universal API for generic non-empty collections, with examples in C# and Haskell.

This article is part of an article series about catamorphisms. A catamorphism is a universal abstraction that describes how to digest a data structure into a potentially more compact value.

I was recently doing some work that required a data structure like a collection, but with the additional constraint that it should be guaranteed to have at least one element. I've known about Haskell's NonEmpty type, and how to port it to C# for years. This time I needed to implement it in a third language, and since I had a little extra time available, I thought it'd be interesting to pursue a conjecture of mine: It seems as though you can implement most (all?) of a generic data structure's API based on its catamorphism.

While I could make a guess as to how a catamorphism might look for a non-empty collection, I wasn't sure. A quick web search revealed nothing conclusive, so I decided to deduce it from first principles. As this article series demonstrates, you can derive the catamorphism from a type's isomorphic F-algebra.

The beginning of this article presents the catamorphism in C#, with an example. The rest of the article describes how to deduce the catamorphism. This part of the article presents my work in Haskell. Readers not comfortable with Haskell can just read the first part, and consider the rest of the article as an optional appendix.

C# catamorphism #

This article will use a custom C# class called NonEmptyCollection<T>, which is near-identical to the NotEmptyCollection<T> originally introduced in the article Semigroups accumulate.

I don't know why I originally chose to name the class NotEmptyCollection instead of NonEmptyCollection, but it's annoyed me ever since. I've finally decided to rectify that mistake, so from now on, the name is NonEmptyCollection.

The catamorphism for NonEmptyCollection is this instance method:

public TResult Aggregate<TResult>(Func<T, IReadOnlyCollection<T>, TResult> algebra)
{
    return algebra(Head, Tail);
}

Because the NonEmptyCollection class is really just a glorified tuple, the algebra is any function which produces a single value from the two constituent values.

It's easy to fall into the trap of thinking of the catamorphism as 'reducing' the data structure to a more compact form. While this is a common kind of operation, loss of data is not inevitable. You can, for example, return a new collection, essentially doing nothing:

var nec = new NonEmptyCollection<int>(42, 1337, 2112, 666);
var same = nec.Aggregate((x, xs) => new NonEmptyCollection<int>(x, xs.ToArray()));

This Aggregate method enables you to safely find a maximum value:

var nec = new NonEmptyCollection<int>(42, 1337, 2112, 666);
var max = nec.Aggregate((x, xs) => xs.Aggregate(x, Math.Max));

or to safely calculate an average:

var nec = new NonEmptyCollection<int>(42, 1337, 2112, 666);
var average = nec.Aggregate((x, xs) => xs.Aggregate(x, (a, b) => a + b) / (xs.Count + 1.0));

Both of these two last examples use the built-in Aggregate function to accumulate the xs. It uses the overload that takes a seed, for which it supplies x. This means that there's guaranteed to be at least that one value.

The catamorphism given here is not unique. You can create a trivial variation by swapping the two function arguments, so that x comes after xs.

NonEmpty F-algebra #

As in the previous article, I'll use Fix and cata as explained in Bartosz Milewski's excellent article on F-algebras.

As always, start with the underlying endofunctor:

data NonEmptyF a c = NonEmptyF { head :: a, tail :: ListFix a }
                     deriving (EqShowRead)
 
instance Functor (NonEmptyF a) where
  fmap _ (NonEmptyF x xs) = NonEmptyF x xs

Instead of using Haskell's standard list ([]) for the tail, I've used ListFix from the article on list catamorphism. This should, hopefully, demonstrate how you can build on already established definitions derived from first principles.

Since a non-empty collection is really just a glorified tuple of head and tail, there's no recursion, and thus, the carrier type c is not used. You could argue that going through all of these motions is overkill, but it still provides some insights. This is similar to the Boolean catamorphism and Maybe catamorphism.

The fmap function ignores the mapping argument (often called f), since the Functor instance maps NonEmptyF a c to NonEmptyF a c1, but the c or c1 type is not used.

As was the case when deducing the recent catamorphisms, Haskell isn't too happy about defining instances for a type like Fix (NonEmptyF a). To address that problem, you can introduce a newtype wrapper:

newtype NonEmptyFix a =
  NonEmptyFix { unNonEmptyFix :: Fix (NonEmptyF a) } deriving (EqShowRead)

You can define Functor, Applicative, Monad, etc. instances for this type without resorting to any funky GHC extensions. Keep in mind that, ultimately, the purpose of all this code is just to figure out what the catamorphism looks like. This code isn't intended for actual use.

A helper function makes it easier to define NonEmptyFix values:

createNonEmptyF :: a -> ListFix a -> NonEmptyFix a
createNonEmptyF x xs = NonEmptyFix $ Fix $ NonEmptyF x xs

Here's how to use it:

ghci> createNonEmptyF 42 $ consF 1337 $ consF 2112 nilF
NonEmptyFix {
  unNonEmptyFix = Fix (NonEmptyF 42 (ListFix (Fix (ConsF 1337 (Fix (ConsF 2112 (Fix NilF)))))))}

While this is quite verbose, keep in mind that the code shown here isn't meant to be used in practice. The goal is only to deduce catamorphisms from more basic universal abstractions, and you now have all you need to do that.

Haskell catamorphism #

At this point, you have two out of three elements of an F-Algebra. You have an endofunctor (NonEmptyF a), and an object c, but you still need to find a morphism NonEmptyF a c -> c. Notice that the algebra you have to find is the function that reduces the functor to its carrier type c, not the 'data type' a. This takes some time to get used to, but that's how catamorphisms work. This doesn't mean, however, that you get to ignore a, as you'll see.

As in the previous articles, start by writing a function that will become the catamorphism, based on cata:

nonEmptyF = cata alg . unNonEmptyFix
  where alg (NonEmptyF x xs) = undefined

While this compiles, with its undefined implementation of alg, it obviously doesn't do anything useful. I find, however, that it helps me think. How can you return a value of the type c from alg? You could pass a function argument to the nonEmptyF function and use it with x and xs:

nonEmptyF :: (a -> ListFix a -> c) -> NonEmptyFix a -> c
nonEmptyF f = cata alg . unNonEmptyFix
  where alg (NonEmptyF x xs) = f x xs

This works. Since cata has the type Functor f => (f a -> a) -> Fix f -> a, that means that alg has the type f a -> a. In the case of NonEmptyF, the compiler infers that the alg function has the type NonEmptyF a c -> c1, which fits the bill, since c may be the same type as c1.

This, then, is the catamorphism for a non-empty collection. This one is just a single function. It's still not the only possible catamorphism, since you could trivially flip the arguments to f.

I've chosen this representation because the arguments x and xs are defined in the same order as the order of head before tail. Notice how this is the same order as the above C# Aggregate method.

Basis #

You can implement most other useful functionality with nonEmptyF. Here's the Semigroup instance and a useful helper function:

toListFix :: NonEmptyFix a -> ListFix a
toListFix = nonEmptyF consF
 
instance Semigroup (NonEmptyFix a) where
  xs <> ys =
    nonEmptyF (\x xs' -> createNonEmptyF x $ xs' <> toListFix ys) xs

The implementation uses nonEmptyF to operate on xs. Inside the lambda expression, it converts ys to a list, and uses the ListFix Semigroup instance to concatenate xs with it.

Here's the Functor instance:

instance Functor NonEmptyFix where
  fmap f = nonEmptyF (\x xs -> createNonEmptyF (f x) $ fmap f xs)

Like the Semigroup instance, this fmap implementation uses fmap on xs, which is the ListFix Functor instance.

The Applicative instance is much harder to write from scratch (or, at least, I couldn't come up with a simpler way):

instance Applicative NonEmptyFix where
  pure x = createNonEmptyF x nilF
  liftA2 f xs ys =
    nonEmptyF
      (\x xs' ->
        nonEmptyF
          (\y ys' ->
            createNonEmptyF
              (f x y)
              (liftA2 f (consF x nilF) ys' <> liftA2 f xs' (consF y ys')))
          ys)
      xs

While that looks complicated, it's not that bad. It uses nonEmptyF to 'loop' over the xs, and then a nested call to nonEmptyF to 'loop' over the ys. The inner lambda expression uses f x y to calculate the head, but it also needs to calculate all other combinations of values in xs and ys.

Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x and y box are connected by an arrow labelled f.

First, it keeps x fixed and 'loops' through all the remaining ys'; that's the liftA2 f (consF x nilF) ys' part:

Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x and y1, y2, y3 boxes are connected by three arrows labelled with a single f.

Then it 'loops' over all the remaining xs' and all the ys; that is, liftA2 f xs' (consF y ys').

Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x1, x2, x3 boxes are connected to the y, y1, y2, y3 boxes by arrows labelled with a single f.

The two liftA2 functions apply to the ListFix Applicative instance.

You'll be happy to see, I think, that the Monad instance is simpler:

instance Monad NonEmptyFix where
  xs >>= f =
    nonEmptyF (\x xs' ->
      nonEmptyF
        (\y ys -> createNonEmptyF y $ ys <> (xs' >>= toListFix . f)) (f x)) xs

And fortunately, Foldable and Traversable are even simpler:

instance Foldable NonEmptyFix where
  foldr f seed = nonEmptyF (\x xs -> f x $ foldr f seed xs)
 
instance Traversable NonEmptyFix where
  traverse f = nonEmptyF (\x xs -> liftA2 createNonEmptyF (f x) (traverse f xs))

Finally, you can implement conversions to and from the NonEmpty type from Data.List.NonEmpty:

toNonEmpty :: NonEmptyFix a -> NonEmpty a
toNonEmpty = nonEmptyF (\x xs -> x :| toList xs)
 
fromNonEmpty :: NonEmpty a -> NonEmptyFix a
fromNonEmpty (x :| xs) = createNonEmptyF x $ fromList xs

This demonstrates that NonEmptyFix is isomorphic to NonEmpty.

Conclusion #

The catamorphism for a non-empty collection is a single function that produces a single value from the head and the tail of the collection. While it's possible to implement a 'standard fold' (foldr in Haskell), the non-empty catamorphism doesn't require a seed to get started. The data structure guarantees that there's always at least one value available, and this value can then be use to 'kick off' a fold.

In C# one can define the catamorphism as the above Aggregate method. You could then define all other instance functions based on Aggregate.

Next: Either catamorphism.


Test-driving the pyramid's top

Monday, 31 July 2023 07:00:00 UTC

Some thoughts on TDD related to integration and systems testing.

My recent article Works on most machines elicited some responses. Upon reflection, it seems that most of the responses relate to the top of the Test Pyramid.

While I don't have an one-shot solution that addresses all concerns, I hope that nonetheless I can suggest some ideas and hopefully inspire a reader or two. That's all. I intend nothing of the following to be prescriptive. I describe my own professional experience: What has worked for me. Perhaps it could also work for you. Use the ideas if they inspire you. Ignore them if you find them impractical.

The Test Pyramid #

The Test Pyramid is often depicted like this:

Standard Test Pyramid, which is really a triangle with three layers: Unit tests, integration tests, and UI tests.

This seems to indicate that while the majority of tests should be unit tests, you should also have a substantial number of integration tests, and quite a few UI tests.

Perhaps the following is obvious, but the Test Pyramid is an idea; it's a way to communicate a concept in a compelling way. What one should take away from it, I think, is only this: The number of tests in each category should form a total order, where the unit test category is the maximum. In other words, you should have more unit tests than you have tests in the next category, and so on.

No-one says that you can only have three levels, or that they have to have the same height. Finally, the above figure isn't even a pyramid, but rather a triangle.

I sometimes think of the Test Pyramid like this:

Test pyramid in perspective.

To be honest, it's not so much whether or not the pyramid is shown in perspective, but rather that the unit test base is significantly more voluminous than the other levels, and that the top is quite small.

Levels #

In order to keep the above discussion as recognisable as possible, I've used the labels unit tests, integration tests, and UI tests. It's easy to get caught up in a discussion about how these terms are defined. Exactly what is a unit test? How does it differ from an integration test?

There's no universally accepted definition of a unit test, so it tends to be counter-productive to spend too much time debating the finer points of what to call the tests in each layer.

Instead, I find the following criteria useful:

  1. In-process tests
  2. Tests that involve more than one process
  3. Tests that can only be performed in production

I'll describe each in a little more detail. Along the way, I'll address some of the reactions to Works on most machines.

In-process tests #

The in-process category corresponds roughly to the Test Pyramid's unit test level. It includes 'traditional' unit tests such as tests of stand-alone functions or methods on objects, but also Facade Tests. The latter may involve multiple modules or objects, perhaps even from multiple libraries. Many people may call them integration tests because they integrate more than one module.

As long as an automated test runs in a single process, in memory, it tends to be fast and leave no persistent state behind. This is almost exclusively the kind of test I tend to test-drive. I often follow an outside-in TDD process, an example of which is shown in my book Code That Fits in Your Head.

Consider an example from the source code that accompanies the book:

[Fact]
public async Task ReserveTableAtTheVaticanCellar()
{
    using var api = new SelfHostedApi();
    var client = api.CreateClient();
    var timeOfDayLaterThanLastSeatingAtTheOtherRestaurants =
        TimeSpan.FromHours(21.5);
 
    var at = DateTime.Today.AddDays(433).Add(
        timeOfDayLaterThanLastSeatingAtTheOtherRestaurants);
    var dto = Some.Reservation.WithDate(at).ToDto();
    var response =
        await client.PostReservation("The Vatican Cellar", dto);
 
    response.EnsureSuccessStatusCode();
}

I think of a test like this as an automated acceptance test. It uses an internal test-specific domain-specific language (test utilities) to exercise the REST service's API. It uses ASP.NET self-hosting to run both the service and the HTTP client in the same process.

Even though this may, at first glance, look like an integration test, it's an artefact of test-driven development. Since it does cut across both HTTP layer and domain model, some readers may think of it as an integration test. It uses a stateful in-memory data store, so it doesn't involve more than a single process.

Tests that span processes #

There are aspects of software that you can't easily drive with tests. I'll return to some really gnarly examples in the third category, but in between, we find concerns that are hard, but still possible to test. The reason that they are hard is often because they involve more than one process.

The most common example is data access. Many software systems save or retrieve data. With test-driven development, you're supposed to let the tests inform your API design decisions in such a way that everything that involves difficult, error-prone code is factored out of the data access layer, and into another part of the code that can be tested in process. This development technique ought to drain the hard-to-test components of logic, leaving behind a Humble Object.

One reaction to Works on most machines concerns exactly that idea:

"As a developer, you need to test HumbleObject's behavior."

It's almost tautologically part of the definition of a Humble Object that you're not supposed to test it. Still, realistically, ladeak has a point.

When I wrote the example code to Code That Fits in Your Head, I applied the Humble Object pattern to the data access component. For a good while, I had a SqlReservationsRepository class that was so simple, so drained of logic, that it couldn't possibly fail.

Until, of course, the inevitable happened: There was a bug in the SqlReservationsRepository code. Not to make a long story out of it, but even with a really low cyclomatic complexity, I'd accidentally swapped two database columns when reading from a table.

Whenever possible, when I discover a bug, I first write an automated test that exposes that bug, and only then do I fix the problem. This is congruent with my lean bias. If a defect can occur once, it can occur again in the future, so it's better to have a regression test.

The problem with this bug is that it was in a Humble Object. So, ladeak is right. Sooner or later, you'll have to test the Humble Object, too.

That's when I had to bite the bullet and add a test library that tests against the database.

One such test looks like this:

[Theory]
[InlineData(Grandfather.Id, "2022-06-29 12:00""e@example.gov""Enigma", 1)]
[InlineData(Grandfather.Id, "2022-07-27 11:40""c@example.com""Carlie", 2)]
[InlineData(2, "2021-09-03 14:32""bon@example.edu""Jovi", 4)]
public async Task CreateAndReadRoundTrip(
    int restaurantId,
    string at,
    string email,
    string name,
    int quantity)
{
    var expected = new Reservation(
        Guid.NewGuid(),
        DateTime.Parse(at, CultureInfo.InvariantCulture),
        new Email(email),
        new Name(name),
        quantity);
    var connectionString = ConnectionStrings.Reservations;
    var sut = new SqlReservationsRepository(connectionString);
 
    await sut.Create(restaurantId, expected);
    var actual = await sut.ReadReservation(restaurantId, expected.Id);
 
    Assert.Equal(expected, actual);
}

The entire test runs in a special context where a database is automatically created before the test runs, and torn down once the test has completed.

"When building such behavior, you can test against a shared instance of the service in your dev team or run that service on your dev machine in a container."

Yes, those are two options. A third, in the spirit of GOOS, is to strongly favour technologies that support automation. Believe it or not, you can automate SQL Server. You don't need a Docker container for it. That's what I did in the above test.

I can see how a Docker container with an external dependency can be useful too, so I'm not trying to dismiss that technology. The point is, however, that simpler alternatives may exist. I, and others, did test-driven development for more than a decade before Docker existed.

Tests that can only be performed in production #

The last category of tests are those that you can only perform on a production system. What might be examples of that?

I've run into a few over the years. One such test is what I call a Smoke Test: Metaphorically speaking, turn it on and see if it develops smoke. These kinds of tests are good at catching configuration errors. Does the web server have the right connection string to the database? A test can verify whether that's the case, but it makes no sense to run such a test on a development machine, or against a test system, or a staging environment. You want to verify that the production system is correctly configured. Only a test against the production system can do that.

For every configuration value, you may want to consider a Smoke Test.

There are other kinds of tests you can only perform in production. Sometimes, it's not technical concerns, but rather legal or financial constraints, that dictate circumstances.

A few years ago I worked with a software organisation that, among other things, integrated with the Danish personal identification number system (CPR). Things may have changed since, but back then, an organisation had to have a legal agreement with CPR before being granted access to its integration services. It's an old system (originally from 1968) with a proprietary data integration protocol.

We test-drove a parser of the data format, but that still left behind a Humble Object that would actually perform the data transfers. How do we test that Humble Object?

Back then, at least, there was no test system for the CPR service, and it was illegal to query the live system unless you had a business reason. And software testing did not constitute a legal reason.

The only legal option was to make the Humble Object as simple and foolproof as possible, and then observe how it worked in actual production situations. Containers wouldn't help in such a situation.

It's possible to write automated tests against production systems, but unless you're careful, they're difficult to write and maintain. At least, go easy on the assertions, since you can't assume much about the run-time data and behaviour of a live system. Smoke tests are mostly just 'pings', so can be written to be fairly maintenance-free, but you shouldn't need many of them.

Other kinds of tests against production are likely to be fragile, so it pays to minimise their number. That's the top of the pyramid.

User interfaces #

I no longer develop user interfaces, so take the following with a pinch of salt.

The 'original' Test Pyramid that I've depicted above has UI tests at the pyramid's top. That doesn't necessarily match the categories I've outlined here; don't assume parity.

A UI test may or may not involve more than one process, but they are often difficult to maintain for other reasons. Perhaps this is where the pyramid metaphor starts to break down. All models are wrong, but some are useful.

Back when I still programmed user interfaces, I'd usually test-drive them via a subcutaneous API, and rely on some variation of MVC to keep the rendered controls in sync. Still, once in a while, you need to verify that the user interface looks as it's supposed to. Often, the best tool for that job is the good old Mark I Eyeball.

This still means that you need to run the application from time to time.

"Docker is also very useful for enabling others to run your software on their machines. Recently, we've been exploring some apps that consisted of ~4 services (web servers) and a database. All of them written in different technologies (PHP, Java, C#). You don't have to setup environment variables. You don't need to have relevant SDKs to build projects etc. Just run docker command, and spin them instantly on your PC."

That sounds like a reasonable use case. I've never found myself in such circumstances, but I can imagine the utility that containers offer in a situation like that. Here's how I envision the scenario:

A box with arrows to three other boxes, which again have arrows to a database symbol.

The boxes with rounded corners symbolise containers.

Again, my original goal with the previous article wasn't to convince you that container technologies are unequivocally bad. Rather, it was to suggest that test-driven development (TDD) solves many of the problems that people seem to think can only be solved with containers. Since TDD has many other beneficial side effects, it's worth considering instead of mindlessly reaching for containers, which may represent only a local maximum.

How could TDD address qfilip's concern?

When I test-drive software, I favour real dependencies, and I favour Fake objects over Mocks and Stubs. Were I to return to user-interface programming today, I'd define its external dependencies as one or more interfaces, and implement a Fake Object for each.

Not only will this enable me to simulate the external dependencies with the Fakes. If I implement the Fakes as part of the production code, I'd even be able to spin up the system, using the Fakes instead of the real system.

App box with arrows pointing to itself.

A Fake is an implementation that 'almost works'. A common example is an in-memory collection instead of a database. It's neither persistent nor thread-safe, but it's internally consistent. What you add, you can retrieve, until you delete it again. For the purposes of starting the app in order to verify that the user interface looks correct, that should be good enough.

Another related example is NServiceBus, which comes with a file transport that is clearly labeled as not for production use. While it's called the Learning Transport, it's also useful for exploratory testing on a development machine. While this example clearly makes use of an external resource (the file system), it illustrates how a Fake implementation can alleviate the need for a container.

Uses for containers #

Ultimately, it's still useful to be able to stand up an entire system, as qfilip suggests, and if containers is a good way to do that, it doesn't bother me. At the risk of sounding like a broken record, I never intended to say that containers are useless.

When I worked as a Software Development Engineer in Microsoft, I had two computers: A laptop and a rather beefy tower PC. I've always done all programming on laptops, so I repurposed the tower as a virtual server with all my system's components on separate virtual machines (VM). The database in one VM, the application server in another, and so on. I no longer remember what all the components were, but I seem to recall that I had four VMs running on that one box.

While I didn't use it much, I found it valuable to occasionally verify that all components could talk to each other on a realistic network topology. This was in 2008, and Docker wasn't around then, but I could imagine it would have made that task easier.

I don't dispute that Docker and Kubernetes are useful, but the job of a software architect is to carefully identify the technologies on which a system should be based. The more technology dependencies you take on, the more rigid the design.

After a few decades of programming, my experience is that as a programmer and architect, I can find better alternatives than depending on container technologies. If testers and IT operators find containers useful to do their jobs, then that's fine by me. Since my code works on most machines, it works in containers, too.

Truly Humble Objects #

One last response, and I'll wrap this up.

"As a developer, you need to test HumbleObject's behavior. What if a DatabaseConnection or a TCP conn to a message queue is down?"

How should such situations be handled? There may always be special cases, but in general, I can think of two reactions:

  • Log the error
  • Retry the operation

Assuming that the Humble Object is a polymorphic type (i.e. inherits a base class or implements an interface), you should be able to extract each of these behaviours to general-purpose components.

In order to log errors, you can either use a Decorator or a global exception handler. Most frameworks provide a way to catch (otherwise) unhandled exceptions, exactly for this purpose, so you don't have to add such functionality to a Humble Object.

Retry logic can also be delegated to a third-party component. For .NET I'd start looking at Polly, but I'd be surprised if other platforms don't have similar libraries that implement the stability patterns from Release It.

Something more specialised, like a fail-over mechanism, sounds like a good reason to wheel out the Chain of Responsibility pattern.

All of these can be tested independently of any Humble Object.

Conclusion #

In a recent article I reflected on my experience with TDD and speculated that a side effect of that process is code flexible enough to work on most machines. Thus, I've never encountered the need for a containers.

Readers responded with comments that struck me as mostly related to the upper levels of the Test Pyramid. In this article, I've attempted to address some of those concerns. I still get by without containers.


Is software getting worse?

Monday, 24 July 2023 06:02:00 UTC

A rant, with some examples.

I've been a software user for thirty years.

My first PC was DOS-based. In my first job, I used OS/2, in the next, Windows 3.11, NT, and later incarnations of Windows.

I wrote my first web sites in Arachnophilia, and my first professional software in Visual Basic, Visual C++, and Visual InterDev.

I used Terminate with my first modem. If I recall correctly, it had a built-in email downloader and offline reader. Later, I switched to Outlook for email. I've used Netscape Navigator, Internet Explorer, Firefox, and Chrome to surf the web.

I've written theses, articles, reports, etc. in Word Perfect for DOS and MS Word for Windows. I wrote my new book Code That Fits In your Head in TexStudio. Yes, it was written entirely in LaTeX.

Updates #

For the first fifteen years, new software version were rare. You'd get a version of AutoCAD, Windows, or Visual C++, and you'd use it for years. After a few years, a new version would come out, and that would be a big deal.

Interim service releases were rare, too, since there was no network-based delivery mechanism. Software came on floppy disks, and later on CDs.

Even if a bug fix was easy to make, it was difficult for a software vendor to distribute it, so most software releases were well-tested. Granted, software had bugs back then, and some of them you learned to work around.

When a new version came out, the same forces were at work. The new version had to be as solid and stable as the previous one. Again, I grant that once in a while, even in those days, this wasn't always the case. Usually, a bad release spelled the demise of a company, because release times were so long that competitors could take advantage of a bad software release.

Usually, however, software updates were improvements, and you looked forward to them.

Decay #

I no longer look forward to updates. These days, software is delivered over the internet, and some applications update automatically.

From a security perspective it can be a good idea to stay up-to-date, and for years, I diligently did that. Lately, however, I've become more conservative. Particularly when it comes to Windows, I ignore all suggestions to update it until it literally forces the update on me.

Just like even-numbered Star Trek movies don't suck the same pattern seems to be true for Windows: Windows XP was good, Windows 7 was good, and Windows 10 wasn't bad either. I kept putting off Windows 11 for as long as possible, but now I use it, and I can't say that I'm surprised that I don't like it.

This article, however, isn't a rant about Windows in particular. This seems to be a general trend, and it's been noticeable for years.

Examples #

I think that the first time I noticed a particular application degrading was Vivino. It started out as a local company here in Copenhagen, and I was a fairly early adopter. Initially, it was great: If you like wine, but don't know that much about it, you could photograph a bottle's label, and it'd automatically recognise the wine and register it in your 'wine library'. I found it useful that I could look up my notes about a wine I'd had a year ago to remind me what I thought of it. As time went on, however, I started to notice errors in my wine library. It might be double entries, or wines that were silently changed to another vintage, etc. Eventually it got so bad that I lost trust in the application and uninstalled it.

Another example is Sublime Text, which I used for writing articles for this blog. I even bought a licence for it. Version 3 was great, but version 4 was weird from the outset. One thing was that they changed how they indicated which parts of a file I'd edited after opening it, and I never understood the idea behind the visuals. Worse was that auto-closing of HTML stopped working. Since I'm that weird dude who writes raw HTML, such a feature is quite important to me. If I write an HTML tag, I expect the editor to automatically add the closing tag, and place my cursor between the two. Sublime Text stopped doing that consistently, and eventually it became annoying enough that I though: Why bother? Now I write in Visual Studio Code.

Microsoft is almost a chapter in itself, but to be fair, I don't consider Microsoft products worse than others. There's just so much of it, and since I've always been working in the Microsoft tech stack, I use a lot of it. Thus, selection bias clearly is at work here. Still, while I don't think Microsoft is worse than the competition, it seems to be part of the trend.

For years, my login screen was stuck on the same mountain lake, even though I tried every remedy suggested on the internet. Eventually, however, a new version of Windows fixed the issue. So, granted, sometimes new versions improve things.

Now, however, I have another problem with Windows Spotlight. It shows nice pictures, and there used to be an option to see where the picture was taken. Since I repaved my machine, this option is gone. Again, I've scoured the internet for resolutions to this problem, but neither rebooting, regedit changes, etc. has so far solved the problem.

That sounds like small problems, so let's consider something more serious. Half a year ago, Outlook used to be able to detect whether I was writing an email in English or Danish. It could even handle the hybrid scenario where parts of an email was in English, and parts in Danish. Since I repaved my machine, this feature no longer works. Outlook doesn't recognise Danish when I write it. One thing are the red squiggly lines under most words, but that's not even the worst. The worst part of this is that even though I'm writing in Danish, outlook thinks I'm writing in English, so it silently auto-corrects Danish words to whatever looks adjacent in English.

Screen shot of Outlook language bug.

This became so annoying that I contacted Microsoft support about it, but while they had me try a number of things, nothing worked. They eventually had to give up and suggested that I reinstalled my machine - which, at that point, I'd done two weeks before.

This used to work, but now it doesn't.

It's not all bad #

I could go on with other examples, but I think that this suffices. After all, I don't think it makes for a compelling read.

Of course, not everything is bad. While it looks as though I'm particularly harping on Microsoft, I rarely detect problems with Visual Studio or Code, and I usually install updates as soon as they are available. The same is true for much other software I use. Paint.NET is awesome, MusicBee is solid, and even the Sonos Windows app, while horrific, is at least consistently so.

Conclusion #

It seems to me that some software is actually getting worse, and that this is a more recent trend.

The point isn't that some software is bad. This has always been the case. What seems new to me is that software that used to be good deteriorates. While this wasn't entirely unheard of in the nineties (I'm looking at you, WordPerfect), this is becoming much more noticeable.

Perhaps it's just frequency illusion, or perhaps it's because I use software much more than I did in the nineties. Still, I can't shake the feeling that some software is deteriorating.

Why does this happen? I don't know, but my own bias suggests that it's because there's less focus on regression testing. Many of the problems I see look like regression bugs to me. A good engineering team could have caught them with automated regression tests, but these days, it seems as though many teams rely on releasing often and then letting users do the testing.

The problem with that approach, however, is that if you don't have good automated tests, fixing one regression may resurrect another.


Comments

I don't think your perception that software is getting worse is wrong, Mark. I've been an Evernote user since 2011. And, for a good portion of those years, I've been a paid customer.

I'm certainly not alone in my perception that the application is becoming worse with each new release, to the point of, at times, becoming unusable. Syncing problems with the mobile version, weird changes in the UI that don't accomplish anything, general sluggishness, and, above all, not listening to the users regarding long-needed features.

2023-07-29 18:05 UTC

Works on most machines

Monday, 17 July 2023 08:01:00 UTC

TDD encourages deployment flexibility. Functional programming also helps.

Recently several of the podcasts I subscribe to have had episodes about various container technologies, of which Kubernetes dominates. I tune out of such content, since it has nothing to do with me.

I've never found containerisation relevant. I remember being fascinated when I first heard of Docker, and for a while, I awaited a reason to use it. It never materialised.

I'd test-drive whatever system I was working on, and deploy it to production. Usually, it'd just work.

Since my process already produced good results, why make it more complicated?

Occasionally, I would become briefly aware of the lack of containers in my life, but then I'd forget about it again. Until now, I haven't thought much about it, and it's probably only the random coincidence of a few podcast episodes back-to-back that made me think more about it.

Be liberal with what system you run on #

When I was a beginner programmer a few years ago, things were different. I'd write code that worked on my machine, but not always on the test server.

As I gained experience, this tended to happen less often. This doubtlessly have multiple causes, and increased experience is likely one of them, but I also think that my interest in loose coupling and test-driven development plays a role.

Increasingly I developed an ethos of writing software that would work on most machines, instead of only my own. It seems reminiscent of Postel's law: Be liberal with what system you run on.

Test-driven development helps in that regard, because you write code that must be able to execute in at least two contexts: The test context, and the actual system context. These two contexts both exist on your machine.

A colleague once taught me: The most difficult generalisation step is going from one to two. Once you've generalised to two cases, it's much easier to generalise to three, four, or n cases.

It seems to me that such from-one-to-two-cases generalisation is an inadvertent by-product of test-driven development. Once your code already matches two different contexts, making it even more flexible isn't that much extra work. It's not even speculative generality because you also need to make it work on the production system and (one hopes) on a build server or continuous delivery pipeline. That's 3-4 contexts. Odds are that software that runs successfully in four separate contexts runs successfully on many more systems.

General-purpose modules #

In A Philosophy of Software Design John Ousterhout argues that one should aim for designing general-purpose objects or modules, rather than specialised APIs. He calls them deep modules and their counterparts shallow modules. On the surface, this seems to go against the grain of YAGNI, but the way I understand the book, the point is rather that general-purpose solutions also solve special cases, and, when done right, the code doesn't have to be more complicated than the one that handles the special case.

As I write in my review of the book, I think that there's a connection with test-driven development. General-purpose code is code that works in more than one situation, including automated testing environments. This is almost tautological. If it doesn't work in an automated test, an argument could be made that it's insufficiently general.

Likewise, general-purpose software should be able to work when deployed to more than one machine. It should even work on machines where other versions of that software already exist.

When you have general-purpose software, though, do you really need containers?

Isolation #

While I've routinely made use of test-driven development since 2003, I started my shift towards functional programming around ten years later. I think that this has amplified my code's flexibility.

As Jessica Kerr pointed out years ago, a corollary of referential transparency is that pure functions are isolated from their environment. Only input arguments affect the output of a pure function.

Ultimately, you may need to query the environment about various things, but in functional programming, querying the environment is impure, so you push it to the boundary of the system. Functional programming encourages you to explicitly consider and separate impure actions from pure functions. This implies that the environment-specific code is small, cohesive, and easy to review.

Conclusion #

For a while, when Docker was new, I expected it to be a technology that I'd eventually pick up and make part of my tool belt. As the years went by, that never happened. As a programmer, I've never had the need.

I think that a major contributor to that is that since I mostly develop software with test-driven development, the resulting software is already robust or flexible enough to run in multiple environments. Adding functional programming to the mix helps to achieve isolation from the run-time environment.

All of this seems to collaborate to enable code to work not just on my machine, but on most machines. Including containers.

Perhaps there are other reasons to use containers and Kubernetes. In a devops context, I could imagine that it makes deployment and operations easier. I don't know much about that, but I also don't mind. If someone wants to take the code I've written and run it in a container, that's fine. It's going to run there too.


Comments

qfilip #

Commenting for the first time. I hope I made these changes in proper manner. Anyway...

Kubernetes usually also means the usage of cloud infrastructure, and as such, it can be automated (and change-tracked) in various interesting ways. Is it worth it? Well, that depends as always... Docker isn't the only container technology supported by k8s, but since it's the most popular one... they go hand in hand.

Docker is also very useful for enabling others to run your software on their machines. Recently, we've been exploring some apps that consisted of ~4 services (web servers) and a database. All of them written in different technologies (PHP, Java, C#). You don't have to setup environment variables. You don't need to have relevant SDKs to build projects etc. Just run docker command, and spin them instantly on your PC.

So there's that...

Unrelated to the topic above, I'd like to ask you, if you could write an article on the specific subject. Or, if the answer is short, comment me back. As an F# enthusiast, I find yours and Scott's blog very valuable. One thing I've failed to find here is why you don't like ORMs. I think the words were they solve a problem that we shouldn't have in the first place. Since F# doesn't play too well with Entity Framework, and I pretty much can't live without it... I'm curious if I'm missing something. A different approach, way of thinking. I can work with raw SQL ofcourse... but the mapping... oh the mapping...

2023-07-18 22:30 UTC

I'm contemplating turning my response into a new article, but it may take some time before I get to it. I'll post here once I have a more thorough response.

2023-07-23 13:56 UTC

qfilip, thank you for writing. I've now published the article that, among many other things, respond to your comment about containers.

I'll get back to your question about ORMs as soon as possible.

2023-07-31 07:01 UTC

I'm still considering how to best address the question about ORMs, but in the meanwhile, I'd like to point interested readers to Ted Neward's famous article The Vietnam of Computer Science.

2023-08-14 20:01 UTC

Finally, I'm happy to announce that I've written an article trying to explain my position: Do ORMs reduce the need for mapping?.

2023-09-18 14:50 UTC

AI for doc comments

Monday, 10 July 2023 06:02:00 UTC

A solution in search of a problem?

I was recently listening to a podcast episode where the guest (among other things) enthused about how advances in large language models mean that you can now get these systems to write XML doc comments.

You know, these things:

/// <summary>
/// Scorbles a dybliad.
/// </summary>
/// <param name="dybliad">The dybliad to scorble.</param>
/// <param name="flag">
/// A flag that controls wether scorbling is done pre- or postvotraid.
/// </param>
/// <returns>The scorbled dybliad.</returns>
public string Scorble(string dybliad, bool flag)

And it struck me how that's not the first time I've encountered that notion. Finally, you no longer need to write those tedious documentation comments in your code. Instead, you can get Github Copilot or ChatGPT to write them for you.

When was the last time you wrote such comments?

I'm sure that there are readers who wrote some just yesterday, but generally, I rarely encounter them in the wild.

As a rule, I only write them when my modelling skills fail me so badly that I need to apologise in code. Whenever I run into such a situation, I may as well take advantage of the format already in place for such things, but it's not taking up a big chunk of my time.

It's been a decade since I ran into a code base where doc comments were mandatory. When I had to write comments, I'd use GhostDoc, which used heuristics to produce 'documentation' on par with modern AI tools.

Whether you use GhostDoc, Github Copilot, or write the comments yourself, most of them tend to be equally inane and vacuous. Good design only amplifies this quality. The better names you use, and the more you leverage the type system to make illegal states unrepresentable, the less you need the kind of documentation furnished by doc comments.

I find it striking that more than one person wax poetic about AI's ability to produce doc comments.

Is that, ultimately, the only thing we'll entrust to large language models?

I know that that they can do more than that, but are we going to let them? Or is automatic doc comments a solution in search of a problem?


Page 5 of 74

"Our team wholeheartedly endorses Mark. His expert service provides tremendous value."
Hire me!