ploeh blog
2023-11-27T08:44:29+00:00
Mark Seemann
danish software design
https://blog.ploeh.dk
Synchronizing concurrent teams
https://blog.ploeh.dk/2023/11/27/synchronizing-concurrent-teams
2023-11-27T08:43:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Or, rather: Try not to.</em>
</p>
<p>
A few months ago I visited a customer and as the day was winding down we got to talk more informally. One of the architects mentioned, in an almost off-hand manner, "we've embarked on a <a href="https://en.wikipedia.org/wiki/Scaled_agile_framework">SAFe</a> journey..."
</p>
<p>
"Yes..?" I responded, hoping that my inflection would sound enough like a question that he'd elaborate.
</p>
<p>
Unfortunately, I'm apparently sometimes too subtle when dealing with people face-to-face, so I never got to hear just how that 'SAFe journey' was going. Instead, the conversation shifted to the adjacent topic of how to coordinate independent teams.
</p>
<p>
I told them that, in my opinion, the best way to coordinate independent teams is to <em>not</em> coordinate them. I don't remember exactly how I proceeded from there, but I probably said something along the lines that I consider coordination meetings between teams to be an 'architecture smell'. That the need to talk to other teams was a symptom that teams were too tightly coupled.
</p>
<p>
I don't remember if I said exactly that, but it would have been in character.
</p>
<p>
The architect responded: "I don't like silos."
</p>
<p>
How do you respond to that?
</p>
<h3 id="c5fdd5e722994d8fb75bb3f32f1cf86b">
Autonomous teams <a href="#c5fdd5e722994d8fb75bb3f32f1cf86b">#</a>
</h3>
<p>
I couldn't very well respond that <em>silos are great</em>. First, it doesn't sound very convincing. Second, it'd be an argument suitable only in a kindergarten. <em>Are not! -Are too! -Not! -Too!</em> etc.
</p>
<p>
After feeling momentarily <a href="https://en.wikipedia.org/wiki/Check_(chess)">checked</a>, for once I managed to think on my feet, so I replied, "I don't suggest that your teams should be isolated from each other. I do encourage people to talk to each other, but I don't think that teams should <em>coordinate</em> much. Rather, think of each team as an organism on the savannah. They interact, and what they do impact others, but in the end they're autonomous life forms. I believe an architect's job is like a ranger's. You can't control the plants or animals, but you can nurture the ecosystem, herding it in a beneficial direction."
</p>
<p>
<img src="/content/binary/samburu.jpg" alt="Gazelles and warthogs in Samburu National Reserve, Kenya.">
</p>
<p>
That ranger metaphor is an old pet peeve of mine, originating from what I consider one of my most under-appreciated articles: <a href="/2012/12/18/ZookeepersmustbecomeRangers">Zookeepers must become Rangers</a>. It's closely related to the more popular metaphor of software architecture as gardening, but I like the wildlife variation because it emphasizes an even more hands-off approach. It removes the illusion that you can control a fundamentally unpredictable process, but replaces it with the hopeful emphasis on stewardship.
</p>
<p>
How do ecosystems thrive? A software architect (or ranger) should nurture resilience in each subsystem, just like evolution has promoted plants' and animals' ability to survive a variety of unforeseen circumstances: Flood, draught, fire, predators, lack of prey, disease, etc.
</p>
<p>
You want teams to work independently. This doesn't mean that they work in isolation, but rather they they are free to act according to their abilities and understanding of the situation. An architect can help them understand the wider ecosystem and predict tomorrow's weather, so to speak, but the team should remain autonomous.
</p>
<h3 id="d0637b2597964cfb9bf6ffd3f0559fd0">
Concurrent work <a href="#d0637b2597964cfb9bf6ffd3f0559fd0">#</a>
</h3>
<p>
I'm assuming that an organisation has multiple teams because they're supposed to work concurrently. While team A is off doing one thing, team B is doing something else. You can attempt to herd them in the same general direction, but beware of tight coordination.
</p>
<p>
What's the problem with coordination? Isn't it a kind of collaboration? Don't we consider that beneficial?
</p>
<p>
I'm not arguing that teams should be antagonistic. Like all metaphors, we should be careful not to take the savannah metaphor too far. I'm not imagining that one team consists of lions, apex predators, killing and devouring other teams.
</p>
<p>
Rather, the reason I'm wary of coordination is because it seems synonymous with <em>synchronisation</em>.
</p>
<p>
In <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> I've already discussed how good practices for Continuous Integration are similar to earlier lessons about <a href="https://en.wikipedia.org/wiki/Optimistic_concurrency_control">optimistic concurrency</a>. It recently struck me that we can draw a similar parallel between concurrent team work and parallel computing.
</p>
<p>
For decades we've known that the less synchronization, the faster parallel code is. Synchronization is costly.
</p>
<p>
In team work, coordination is like thread synchronization. Instead of doing work, you stop in order to coordinate. This implies that one thread or team has to wait for the other to catch up.
</p>
<p>
<img src="/content/binary/sync-wait.png" alt="Two horizontal bars presenting two processes, A and B. A is shorter than B, indicating that it finishes first.">
</p>
<p>
Unless work is perfectly evenly divided, team A may finish before team B. In order to coordinate, team A must sit idle for a while, waiting for B to catch up. (In development organizations, <em>idleness</em> is rarely allowed, so in practice, team A embarks on some other work, with <a href="/2012/12/18/ZookeepersmustbecomeRangers">consequences that I've already outlined</a>.)
</p>
<p>
If you have more than two teams, this phenomenon only becomes worse. You'll have more idle time. This reminds me of <a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl's law</a>, which briefly put expresses that there's a limit to how much of a speed improvement you can get from concurrent work. The limit is related to the percentage of the work that can <em>not</em> be parallelized. The greater the need to synchronize work, the lower the ceiling. Conversely, the more you can let concurrent processes run without coordination, the more you gain from parallelization.
</p>
<p>
It seems to me that there's a direct counterpart in team organization. The more teams need to coordinate, the less is gained from having multiple teams.
</p>
<p>
But really, <a href="/ref/mythical-man-month">Fred Brooks could you have told you so in 1975</a>.
</p>
<h3 id="11f48be968f94f67b42d0934a4d08501">
Versioning <a href="#11f48be968f94f67b42d0934a4d08501">#</a>
</h3>
<p>
A small development team may organize work informally. Work may be divided along 'natural' lines, each developer taking on tasks best suited to his or her abilities. If working in a code base with shared ownership, one developer doesn't <em>have</em> to wait on the work done by another developer. Instead, a programmer may complete the required work individually, or working together with a colleague. Coordination happens, but is both informal and frequent.
</p>
<p>
As development organizations grow, teams are formed. Separate teams are supposed to work independently, but may in practice often depend on each other. Team A may need team B to make a change before they can proceed with their own work. The (felt) need to coordinate team activities arise.
</p>
<p>
In my experience, this happens for a number of reasons. One is that teams may be divided along wrong lines; this is a socio-technical problem. Another, more technical, reason is that <a href="/2012/12/18/RangersandZookeepers">zookeepers</a> rarely think explicitly about versioning or avoiding breaking changes. Imagine that team A needs team B to develop a new capability. This new capability <em>implies</em> a breaking change, so the teams will now need to coordinate.
</p>
<p>
Instead, team B should develop the new feature in such a way that it doesn't break existing clients. If all else fails, the new feature must exist side-by-side with the old way of doing things. With <a href="https://en.wikipedia.org/wiki/Continuous_deployment">Continuous Deployment</a> the new feature becomes available when it's ready. Team A still has to <em>wait</em> for the feature to become available, but no <em>synchronization</em> is required.
</p>
<h3 id="def545737be9487da7c4b01dcc3eb106">
Conclusion <a href="#def545737be9487da7c4b01dcc3eb106">#</a>
</h3>
<p>
Yet another lesson about thread-safety and concurrent transactions seems to apply to people and processes. Parallel processes should be autonomous, with as little synchronization as possible. The more you coordinate development teams, the more you limit the speed of overall work. This seems to suggest that something akin to Amdahl's law also applies to development organizations.
</p>
<p>
Instead of coordinating teams, encourage them to exist as autonomous entities, but set things up so that <em>not breaking compatibility</em> is a major goal for each team.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Trimming a Fake Object
https://blog.ploeh.dk/2023/11/20/trimming-a-fake-object
2023-11-20T06:44:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A refactoring example.</em>
</p>
<p>
When I introduce the <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Object</a> testing pattern to people, a common concern is the maintenance burden of it. The point of the pattern is that you write some 'working' code only for test purposes. At a glance, it seems as though it'd be more work than using a dynamic mock library like <a href="https://www.devlooped.com/moq/">Moq</a> or <a href="https://site.mockito.org/">Mockito</a>.
</p>
<p>
This article isn't really about that, but the benefit of a Fake Object is that it has a <em>lower</em> maintenance footprint because it gives you a single class to maintain when you change interfaces or base classes. Dynamic mock objects, on the contrary, leads to <a href="https://en.wikipedia.org/wiki/Shotgun_surgery">Shotgun surgery</a> because every time you change an interface or base class, you have to revisit multiple tests.
</p>
<p>
In a <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">recent article</a> I presented a Fake Object that may have looked bigger than most people would find comfortable for test code. In this article I discuss how to trim it via a set of refactorings.
</p>
<h3 id="22e6648934bb4ae59aa1a181940172ae">
Original Fake read registry <a href="#22e6648934bb4ae59aa1a181940172ae">#</a>
</h3>
<p>
The article presented this <code>FakeReadRegistry</code>, repeated here for your convenience:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeReadRegistry</span> : IReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeReadRegistry</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
views[d] = newView;
}
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms;
}
}</pre>
</p>
<p>
This is 47 lines of code, spread over five members (including the constructor). Three of the methods have a <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> (CC) of <em>2</em>, which is the maximum for this class. The remaining two have a CC of <em>1</em>.
</p>
<p>
While you <em>can</em> play some <a href="/2023/11/14/cc-golf">CC golf</a> with those CC-2 methods, that tends to pull the code in a direction of being less <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a>. For that reason, I chose to present the code as above. Perhaps more importantly, it doesn't save that many lines of code.
</p>
<p>
Had this been a piece of production code, no-one would bat an eye at size or complexity, but this is test code. To add spite to injury, those 47 lines of code implement this two-method interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IReadRegistry</span>
{
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>);
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>);
}</pre>
</p>
<p>
Can we improve the situation?
</p>
<h3 id="49f78ebabf3d4875b469e935151c064a">
Root cause analysis <a href="#49f78ebabf3d4875b469e935151c064a">#</a>
</h3>
<p>
Before you rush to 'improve' code, it pays to understand why it looks the way it looks.
</p>
<p>
Code is a wonderfully malleable medium, so you should regard nothing as set in stone. On the other hand, there's often a reason it looks like it does. It <em>may</em> be that the previous programmers were incompetent ogres for hire, but often there's a better explanation.
</p>
<p>
I've outlined my thinking process in <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">the previous article</a>, and I'm not going to repeat it all here. To summarise, though, I've applied the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>.
</p>
<blockquote>
<p>
"clients [...] own the abstract interfaces"
</p>
<footer><cite>Robert C. Martin, <a href="/ref/appp">APPP</a>, chapter 11</cite></footer>
</blockquote>
<p>
In other words, I let the needs of the clients guide the design of the <code>IReadRegistry</code> interface, and then the implementation (<code>FakeReadRegistry</code>) had to conform.
</p>
<p>
But that's not the whole truth.
</p>
<p>
I was doing a programming exercise - the <a href="https://codingdojo.org/kata/CQRS_Booking/">CQRS booking</a> kata - and I was following the instructions given in the description. They quite explicitly outline the two dependencies and their methods.
</p>
<p>
When trying a new exercise, it's a good idea to follow instructions closely, so that's what I did. Once you get a sense of a kata, though, there's no law saying that you have to stick to the original rules. After all, the purpose of an exercise is to train, and in programming, <a href="/2020/01/13/on-doing-katas">trying new things is training</a>.
</p>
<h3 id="740cb249aff74666af7af4784cc166b8">
Test code that wants to be production code <a href="#740cb249aff74666af7af4784cc166b8">#</a>
</h3>
<p>
A major benefit of test-driven development (TDD) is that it provides feedback. It pays to be tuned in to that channel. The above <code>FakeReadRegistry</code> seems to be trying to tell us something.
</p>
<p>
Consider the <code>GetFreeRooms</code> method. I'll repeat the single-expression body here for your convenience:
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();</pre>
</p>
<p>
Why is that the implementation? Why does it need to first enumerate the dates in the requested interval? Why does it need to call <code>GetView</code> for each date?
</p>
<p>
Why don't I just do the following and be done with it?
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeStorage</span> : Collection<Booking>, IWriteRegistry, IReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeStorage</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">booked</span> = <span style="color:blue;">this</span>.Where(<span style="font-weight:bold;color:#1f377f;">b</span> => b.Overlaps(arrival, departure)).ToList();
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms
.Where(<span style="font-weight:bold;color:#1f377f;">r</span> => !booked.Any(<span style="font-weight:bold;color:#1f377f;">b</span> => b.RoomName == r.Name))
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Save</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
Add(booking);
}
}</pre>
</p>
<p>
To be honest, that's what I did <em>first</em>.
</p>
<p>
While there are two interfaces, there's only one Fake Object implementing both. That's often an easy way to address the <a href="https://en.wikipedia.org/wiki/Interface_segregation_principle">Interface Segregation Principle</a> and still keeping the Fake Object simple.
</p>
<p>
This is much simpler than <code>FakeReadRegistry</code>, so why didn't I just keep that?
</p>
<p>
I didn't feel it was an honest attempt at CQRS. In CQRS you typically write the data changes to one system, and then you have another logical process that propagates the information about the data modification to the <em>read</em> subsystem. There's none of that here. Instead of being based on one or more 'materialised views', the query is just that: A query.
</p>
<p>
That was what I attempted to address with <code>FakeReadRegistry</code>, and I think it's a much more faithful CQRS implementation. It's also more complex, as CQRS tends to be.
</p>
<p>
In both cases, however, it seems that there's some production logic trapped in the test code. Shouldn't <code>EnumerateDates</code> be production code? And how about the general 'algorithm' of <code>RoomBooked</code>:
</p>
<ul>
<li>Enumerate the relevant dates</li>
<li>Get the 'materialised' view for each date</li>
<li>Calculate the new view for that date</li>
<li>Update the collection of views for that date</li>
</ul>
<p>
That seems like just enough code to warrant moving it to the production code.
</p>
<p>
A word of caution before we proceed. When deciding to pull some of that test code into the production code, I'm making a decision about architecture.
</p>
<p>
Until now, I'd been following the Dependency Inversion Principle closely. The interfaces exist because the client code needs them. Those interfaces could be implemented in various ways: You could use a relational database, a document database, files, blobs, etc.
</p>
<p>
Once I decide to pull the above algorithm into the production code, I'm choosing a particular persistent data structure. This now locks the data storage system into a design where there's a persistent view per date, and another database of bookings.
</p>
<p>
Now that I'd learned some more about the exercise, I felt confident making that decision.
</p>
<h3 id="d1515fe093394daf8cf9e4a9ec687770">
Template Method <a href="#d1515fe093394daf8cf9e4a9ec687770">#</a>
</h3>
<p>
The first move I made was to create a superclass so that I could employ the <a href="https://en.wikipedia.org/wiki/Template_method_pattern">Template Method</a> pattern:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReadRegistry</span> : IReadRegistry
{
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(Rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
UpdateView(d, newView);
}
}
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (TryGetView(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> Rooms;
}
}</pre>
</p>
<p>
This looks similar to <code>FakeReadRegistry</code>, so how is this an improvement?
</p>
<p>
The new <code>ReadRegistry</code> class is production code. It can, and should, be tested. (Due to the history of how we got here, <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">it's already covered by tests</a>, so I'm not going to repeat that effort here.)
</p>
<p>
True to the <a href="https://en.wikipedia.org/wiki/Template_method_pattern">Template Method</a> pattern, three <code>abstract</code> members await a child class' implementation. These are the <code>UpdateView</code> and <code>TryGetView</code> methods, as well as the <code>Rooms</code> read-only property (glorified getter method).
</p>
<p>
Imagine that in the production code, these are implemented based on file/document/blob storage - one per date. <code>TryGetView</code> would attempt to read the document from storage, <code>UpdateView</code> would create or modify the document, while <code>Rooms</code> returns a default set of rooms.
</p>
<p>
A Test Double, however, can still use an in-memory dictionary:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeReadRegistry</span> : ReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> IReadOnlyCollection<Room> Rooms => rooms;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeReadRegistry</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
views[date] = view;
}
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> views.TryGetValue(date, <span style="color:blue;">out</span> view);
}
}</pre>
</p>
<p>
Each <code>override</code> is a one-liner with cyclomatic complexity <em>1</em>.
</p>
<h3 id="91153011891b4b7791acfe0edc65f997">
First round of clean-up <a href="#91153011891b4b7791acfe0edc65f997">#</a>
</h3>
<p>
An abstract class is already a polymorphic object, so we no longer need the <code>IReadRegistry</code> interface. Delete that, and update all code accordingly. Particularly, the <code>QueryService</code> now depends on <code>ReadRegistry</code> rather than <code>IReadRegistry</code>:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> ReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">QueryService</span>(ReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}</pre>
</p>
<p>
Now move the <code>Reserve</code> function from <code>QueryService</code> to <code>ReadRegistry</code>. Once this is done, the <code>QueryService</code> looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">QueryService</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> ReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">QueryService</span>(ReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> readRegistry.GetFreeRooms(arrival, departure);
}
}</pre>
</p>
<p>
That class is only passing method calls along, so clearly no longer serving any purpose. Delete it.
</p>
<p>
This is a not uncommon in CQRS. One might even argue that if CQRS is done right, there's almost no code on the query side, since all the data view update happens as events propagate.
</p>
<h3 id="640e19ed6f904d71b925672028aeee45">
From abstract class to Dependency Injection <a href="#640e19ed6f904d71b925672028aeee45">#</a>
</h3>
<p>
While the current state of the code is based on an abstract base class, the overall architecture of the system doesn't hinge on inheritance. From <a href="/2018/02/19/abstract-class-isomorphism">Abstract class isomorphism</a> we know that it's possible to refactor an abstract class to Constructor Injection. Let's do that.
</p>
<p>
First add an <code>IViewStorage</code> interface that mirrors the three <code>abstract</code> methods defined by <code>ReadRegistry</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IViewStorage</span>
{
IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
}</pre>
</p>
<p>
Then implement it with a Fake Object:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeViewStorage</span> : IViewStorage
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">public</span> IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeViewStorage</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
Rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
views[date] = view;
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> views.TryGetValue(date, <span style="color:blue;">out</span> view);
}
}</pre>
</p>
<p>
Notice the similarity to <code>FakeReadRegistry</code>, which we'll get rid of shortly.
</p>
<p>
Now inject <code>IViewStorage</code> into <code>ReadRegistry</code>, and make <code>ReadRegistry</code> a regular (<code>sealed</code>) class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReadRegistry</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IViewStorage viewStorage;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ReadRegistry</span>(IViewStorage <span style="font-weight:bold;color:#1f377f;">viewStorage</span>)
{
<span style="color:blue;">this</span>.viewStorage = viewStorage;
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(viewStorage.Rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = Reserve(booking, view);
viewStorage.UpdateView(d, newView);
}
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<Room> <span style="color:#74531f;">Reserve</span>(
Booking <span style="font-weight:bold;color:#1f377f;">booking</span>,
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">existingView</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> existingView
.Where(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name != booking.RoomName)
.ToList();
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (viewStorage.TryGetView(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> viewStorage.Rooms;
}
}</pre>
</p>
<p>
You can now delete the <code>FakeReadRegistry</code> Test Double, since <code>FakeViewStorage</code> has now taken its place.
</p>
<p>
Finally, we may consider if we can make <code>FakeViewStorage</code> even slimmer. While I usually favour composition over inheritance, I've found that deriving Fake Objects from collection base classes is often an efficient way to get a lot of mileage out of a few lines of code. <code>FakeReadRegistry</code>, however, had to inherit from <code>ReadRegistry</code>, so it couldn't derive from any other class.
</p>
<p>
<code>FakeViewStorage</code> isn't constrained in that way, so it's free to inherit from <code>Dictionary<DateOnly, IReadOnlyCollection<Room>></code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeViewStorage</span> : Dictionary<DateOnly, IReadOnlyCollection<Room>>, IViewStorage
{
<span style="color:blue;">public</span> IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeViewStorage</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
Rooms = rooms;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="color:blue;">this</span>[date] = view;
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> TryGetValue(date, <span style="color:blue;">out</span> view);
}
}</pre>
</p>
<p>
This last move isn't strictly necessary, but I found it worth at least mentioning.
</p>
<p>
I hope you'll agree that this is a Fake Object that looks maintainable.
</p>
<h3 id="cd86ac335816431aa39a6538fd9ce95c">
Conclusion <a href="#cd86ac335816431aa39a6538fd9ce95c">#</a>
</h3>
<p>
Test-driven development is a feedback mechanism. If something is difficult to test, it tells you something about your System Under Test (SUT). If your test code looks bloated, that tells you something too. Perhaps part of the test code really belongs in the production code.
</p>
<p>
In this article, we started with a Fake Object that looked like it contained too much production code. Via a series of refactorings I moved the relevant parts to the production code, leaving me with a more idiomatic and conforming implementation.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
CC golf
https://blog.ploeh.dk/2023/11/14/cc-golf
2023-11-14T14:44:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Noun. Game in which the goal is to minimise cyclomatic complexity.</em>
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">Cyclomatic complexity</a> (CC) is a rare code metric since it <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">can be actually useful</a>. In general, it's a good idea to minimise it as much as possible.
</p>
<p>
In short, CC measures looping and branching in code, and this is often where bugs lurk. While it's only a rough measure, I nonetheless find the metric useful as a general guideline. Lower is better.
</p>
<h3 id="0a1e55fd6ebf422aac1f6441e4e34f99">
Golf <a href="#0a1e55fd6ebf422aac1f6441e4e34f99">#</a>
</h3>
<p>
I'd like to propose the term "CC golf" for the activity of minimising cyclomatic complexity in an area of code. The name derives from <a href="https://en.wikipedia.org/wiki/Code_golf">code golf</a>, in which you have to implement some behaviour (typically an algorithm) in fewest possible characters.
</p>
<p>
Such games can be useful because they enable you to explore different ways to express yourself in code. It's always a good <a href="/2020/01/13/on-doing-katas">kata constraint</a>. The <a href="/2011/05/16/TennisKatawithimmutabletypesandacyclomaticcomplexityof1">first time I tried that was in 2011</a>, and when looking back on that code today, I'm not that impressed. Still, it taught me a valuable lesson about the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor pattern</a> that I never forgot, and that later enabled me to <a href="/2018/06/25/visitor-as-a-sum-type">connect some important dots</a>.
</p>
<p>
But don't limit CC golf to katas and the like. Try it in your production code too. Most production code I've seen could benefit from some CC golf, and if you <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">use Git tactically</a> you can always stash the changes if they're no good.
</p>
<h3 id="f7e54cb8b9954fbd9ed8022ad09f5d7f">
Idiomatic tension <a href="#f7e54cb8b9954fbd9ed8022ad09f5d7f">#</a>
</h3>
<p>
Alternative expressions with lower cyclomatic complexity may not always be idiomatic. Let's look at a few examples. In my <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">previous article</a>, I listed some test code where some helper methods had a CC of <em>2</em>. Here's one of them:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}</pre>
</p>
<p>
Can you express this functionality with a CC of <em>1?</em> In <a href="https://www.haskell.org/">Haskell</a> it's essentially built in as <code>(. pred) . enumFromTo</code>, and in <a href="https://fsharp.org/">F#</a> it's also idiomatic, although more verbose:
</p>
<p>
<pre><span style="color:blue;">let</span> enumerateDates (arrival : DateOnly) departure =
Seq.initInfinite id |> Seq.map arrival.AddDays |> Seq.takeWhile (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span> d < departure)</pre>
</p>
<p>
Can we do the same in C#?
</p>
<p>
If there's a general API in .NET that corresponds to the F#-specific <code>Seq.initInfinite</code> I haven't found it, but we can do something like this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">const</span> <span style="color:blue;">int</span> infinity = <span style="color:blue;">int</span>.MaxValue; <span style="color:green;">// As close as int gets, at least</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> Enumerable.Range(0, infinity).Select(arrival.AddDays).TakeWhile(<span style="font-weight:bold;color:#1f377f;">d</span> => d < departure);
}</pre>
</p>
<p>
In C# infinite sequences are generally unusual, but <em>if</em> you were to create one, a combination of <code>while true</code> and <code>yield return</code> would be the most idiomatic. The problem with that, though, is that such a construct has a cyclomatic complexity of <em>2</em>.
</p>
<p>
The above suggestion gets around that problem by pretending that <code>int.MaxValue</code> is infinity. Practically, at least, a 32-bit signed integer can't get larger than that anyway. I haven't tried to let F#'s <a href="https://fsharp.github.io/fsharp-core-docs/reference/fsharp-collections-seqmodule.html#initInfinite">Seq.initInfinite</a> run out, but by its type it seems <code>int</code>-bound as well, so in practice it, too, probably isn't infinite. (Or, if it is, the index that it supplies will have to overflow and wrap around to a negative value.)
</p>
<p>
Is this alternative C# code better than the first? You be the judge of that. It has a lower cyclomatic complexity, but is less idiomatic. This isn't uncommon. In languages with a procedural background, there's often tension between lower cyclomatic complexity and how 'things are usually done'.
</p>
<h3 id="d24ea9b2a10f482693071d7dbe1c6604">
Checking for null <a href="#d24ea9b2a10f482693071d7dbe1c6604">#</a>
</h3>
<p>
Is there a way to reduce the cyclomatic complexity of the <code>GetView</code> helper method?
</p>
<p>
<pre><span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms;
}</pre>
</p>
<p>
This is an example of the built-in API being in the way. In F#, you naturally write the same behaviour with a CC of <em>1:</em>
</p>
<p>
<pre><span style="color:blue;">let</span> getView (date : DateOnly) =
views |> Map.tryFind date |> Option.defaultValue rooms |> Set.ofSeq</pre>
</p>
<p>
That <code>TryGet</code> idiom is in the way for further CC reduction, it seems. It <em>is</em> possible to reach a CC of <em>1</em>, though, but it's neither pretty nor idiomatic:
</p>
<p>
<pre><span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span>[] { view, rooms }.Where(<span style="font-weight:bold;color:#1f377f;">x</span> => x <span style="color:blue;">is</span> { }).First()!;
}</pre>
</p>
<p>
Perhaps there's a better way, but if so, it escapes me. Here, I use my knowledge that <code>view</code> is going to remain <code>null</code> if <code>TryGetValue</code> doesn't find the dictionary entry. Thus, I can put it in front of an array where I put the fallback value <code>rooms</code> as the second element. Then I filter the array by only keeping the elements that are <em>not</em> <code>null</code> (that's what the <code>x is { }</code> pun means; I usually read it as <em>x is something</em>). Finally, I return the first of these elements.
</p>
<p>
I know that <code>rooms</code> is never <code>null</code>, but apparently the compiler can't tell. Thus, I have to suppress its anxiety with the <code>!</code> operator, telling it that this <em>will</em> result in a non-null value.
</p>
<p>
I would never use such a code construct in a professional C# code base.
</p>
<h3 id="b1c8693ac29a43a1812e4b9ba9f86e6e">
Side effects <a href="#b1c8693ac29a43a1812e4b9ba9f86e6e">#</a>
</h3>
<p>
The third helper method suggests another kind of problem that you may run into:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
views[d] = newView;
}
}</pre>
</p>
<p>
Here the higher-than-one CC stems from the need to loop through dates in order to produce a side effect for each. Even in F# I do that:
</p>
<p>
<pre><span style="color:blue;">member</span> this.RoomBooked booking =
<span style="color:blue;">for</span> d <span style="color:blue;">in</span> enumerateDates booking.Arrival booking.Departure <span style="color:blue;">do</span>
<span style="color:blue;">let</span> newView = getView d |> QueryService.reserve booking |> Seq.toList
views <span style="color:blue;"><-</span> Map.add d newView views</pre>
</p>
<p>
This also has a cyclomatic complexity of <em>2</em>. You could do something like this:
</p>
<p>
<pre><span style="color:blue;">member</span> this.RoomBooked booking =
enumerateDates booking.Arrival booking.Departure
|> Seq.iter (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span>
<span style="color:blue;">let</span> newView = getView d |> QueryService.reserve booking |> Seq.toList <span style="color:blue;">in</span>
views <span style="color:blue;"><-</span> Map.add d newView views)</pre>
</p>
<p>
but while that nominally has a CC of <em>1</em>, it has the same level of indentation as the previous attempt. This seems to indicate, at least, that it doesn't <em>really</em> address any complexity issue.
</p>
<p>
You could also try something like this:
</p>
<p>
<pre><span style="color:blue;">member</span> this.RoomBooked booking =
enumerateDates booking.Arrival booking.Departure
|> Seq.map (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span> d, getView d |> QueryService.reserve booking |> Seq.toList)
|> Seq.iter (<span style="color:blue;">fun</span> (d, newView) <span style="color:blue;">-></span> views <span style="color:blue;"><-</span> Map.add d newView views)</pre>
</p>
<p>
which, again, may be nominally better, but forced me to wrap the <code>map</code> output in a tuple so that both <code>d</code> and <code>newView</code> is available to <code>Seq.iter</code>. I tend to regard that as a code smell.
</p>
<p>
This latter version is, however, fairly easily translated to C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
EnumerateDates(booking.Arrival, booking.Departure)
.Select(<span style="font-weight:bold;color:#1f377f;">d</span> => (d, view: QueryService.Reserve(booking, GetView(d))))
.ToList()
.ForEach(<span style="font-weight:bold;color:#1f377f;">x</span> => views[x.d] = x.view);
}</pre>
</p>
<p>
The standard .NET API doesn't have something equivalent to <code>Seq.iter</code> (although you could trivially write such an action), but <a href="https://stackoverflow.com/a/1509450/126014">you can convert any sequence to a <code>List<T></code> and use its <code>ForEach</code> method</a>.
</p>
<p>
In practice, though, I tend to <a href="https://ericlippert.com/2009/05/18/foreach-vs-foreach/">agree with Eric Lippert</a>. There's already an idiomatic way to iterate over each item in a collection, and <a href="https://peps.python.org/pep-0020/">being explicit</a> is generally helpful to the reader.
</p>
<h3 id="5f060ce8557043dfb0f374ef254cc922">
Church encoding <a href="#5f060ce8557043dfb0f374ef254cc922">#</a>
</h3>
<p>
There's a general solution to most of CC golf: Whenever you need to make a decision and branch between two or more pathways, you can model that with a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a>. In C# you can mechanically model that with <a href="/2018/05/22/church-encoding">Church encoding</a> or <a href="/2018/06/25/visitor-as-a-sum-type">the Visitor pattern</a>. If you haven't tried that, I recommend it for the exercise, but once you've done it enough times, you realise that it requires little creativity.
</p>
<p>
As an example, in 2021 I <a href="/2021/08/03/the-tennis-kata-revisited">revisited the Tennis kata</a> with the explicit purpose of translating <a href="/2016/02/10/types-properties-software">my usual F# approach to the exercise</a> to C# using Church encoding and the Visitor pattern.
</p>
<p>
Once you've got a sense for how Church encoding enables you to simulate pattern matching in C#, there are few surprises. You may also rightfully question what is gained from such an exercise:
</p>
<p>
<pre><span style="color:blue;">public</span> IScore <span style="font-weight:bold;color:#74531f;">VisitPoints</span>(IPoint <span style="font-weight:bold;color:#1f377f;">playerOnePoint</span>, IPoint <span style="font-weight:bold;color:#1f377f;">playerTwoPoint</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> playerWhoWinsBall.Match(
playerOne: playerOnePoint.Match<IScore>(
love: <span style="color:blue;">new</span> Points(<span style="color:blue;">new</span> Fifteen(), playerTwoPoint),
fifteen: <span style="color:blue;">new</span> Points(<span style="color:blue;">new</span> Thirty(), playerTwoPoint),
thirty: <span style="color:blue;">new</span> Forty(playerWhoWinsBall, playerTwoPoint)),
playerTwo: playerTwoPoint.Match<IScore>(
love: <span style="color:blue;">new</span> Points(playerOnePoint, <span style="color:blue;">new</span> Fifteen()),
fifteen: <span style="color:blue;">new</span> Points(playerOnePoint, <span style="color:blue;">new</span> Thirty()),
thirty: <span style="color:blue;">new</span> Forty(playerWhoWinsBall, playerOnePoint)));
}</pre>
</p>
<p>
Believe it or not, but that method has a CC of <em>1</em> despite the double indentation strongly suggesting that there's some branching going on. To a degree, this also highlights the limitations of the cyclomatic complexity metric. Conversely, <a href="/2021/03/29/table-driven-tennis-scoring">stupidly simple code may have a high CC rating</a>.
</p>
<p>
Most of the examples in this article border on the pathological, and I don't recommend that you write code <em>like</em> that. I recommend that you do the exercise. In less pathological scenarios, there are real benefits to be reaped.
</p>
<h3 id="931c0946572041449fcce50da5f5219b">
Idioms <a href="#931c0946572041449fcce50da5f5219b">#</a>
</h3>
<p>
In 2015 I published an article titled <a href="/2015/08/03/idiomatic-or-idiosyncratic">Idiomatic or idiosyncratic?</a> In it, I tried to explore the idea that the notion of idiomatic code can sometimes hold you back. I revisited that idea in 2021 in an article called <a href="/2021/05/17/against-consistency">Against consistency</a>. The point in both cases is that just because something looks unfamiliar, it doesn't mean that it's bad.
</p>
<p>
Coding idioms somehow arose. If you believe that there's a portion of natural selection involved in the development of coding idioms, you may assume by default that idioms represent good ways of doing things.
</p>
<p>
To a degree I believe this to be true. Many idioms represent the best way of doing things at the time they settled into the shape that we now know them. Languages and contexts change, however. Just look at <a href="/2019/07/15/tester-doer-isomorphisms">the many approaches to data lookups</a> there have been over the years. For many years now, C# has settled into the so-called <em>TryParse</em> idiom to solve that problem. In my opinion this represents a local maximum.
</p>
<p>
Languages that provide <a href="/2018/06/04/church-encoded-maybe">Maybe</a> (AKA <code>option</code>) and <a href="/2018/06/11/church-encoded-either">Either</a> (AKA <code>Result</code>) types offer a superior alternative. These types naturally compose into <em>CC 1</em> pipelines, whereas <em>TryParse</em> requires you to stop what you're doing in order to check a return value. How very <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>-like.
</p>
<p>
All that said, I still think you should write idiomatic code by default, but don't be a slave by what's considered idiomatic, just as you shouldn't be a slave to consistency. If there's a better way of doing things, choose the better way.
</p>
<h3 id="8bf3cb23fa5a4f3aa532a40b01dbefb1">
Conclusion <a href="#8bf3cb23fa5a4f3aa532a40b01dbefb1">#</a>
</h3>
<p>
While cyclomatic complexity is a rough measure, it's one of the few useful programming metrics I know of. It should be as low as possible.
</p>
<p>
Most professional code I encounter implements decisions almost exclusively with language primitives: <code>if</code>, <code>for</code>, <code>switch</code>, <code>while</code>, etc. Once, an organisation hired me to give a one-day <em>anti-if</em> workshop. There are other ways to make decisions in code. Most of those alternatives reduce cyclomatic complexity.
</p>
<p>
That's not really a goal by itself, but reducing cyclomatic complexity tends to produce the beneficial side effect of structuring the code in a more sustainable way. It becomes easier to understand and change.
</p>
<p>
As the cliché goes: <em>Choose the right tool for the job.</em> You can't, however, do that if you have nothing to choose from. If you only know of one way to do a thing, you have no choice.
</p>
<p>
Play a little CC golf with your code from time to time. It may improve the code, or it may not. If it didn't, just <a href="https://git-scm.com/docs/git-stash">stash</a> those changes. Either way, you've probably <em>learned</em> something.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Fakes are Test Doubles with contracts
https://blog.ploeh.dk/2023/11/13/fakes-are-test-doubles-with-contracts
2023-11-13T17:11:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Contracts of Fake Objects can be described by properties.</em>
</p>
<p>
The first time I tried my hand with the <a href="https://codingdojo.org/kata/CQRS_Booking/">CQRS Booking kata</a>, I abandoned it after 45 minutes because I found that I had little to learn from it. After all, I've already done umpteen variations of (restaurant) booking code examples, in several programming languages. The code example that accompanies my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> is only the largest and most complete of those.
</p>
<p>
I also wrote <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2011/april/azure-development-cqrs-on-microsoft-azure">an MSDN Magazine article</a> in 2011 about <a href="https://en.wikipedia.org/wiki/Command_Query_Responsibility_Segregation">CQRS</a>, so I think I have that angle covered as well.
</p>
<p>
Still, while at first glance the kata seemed to have little to offer me, I've found myself coming back to it a few times. It does enable me to focus on something else than the 'production code'. In fact, it turns out that even if (or perhaps particularly <em>when</em>) you use test-driven development (TDD), there's precious little production code. Let's get that out of the way first.
</p>
<h3 id="b1192f76c2ef4f31b6bddcbc944664c7">
Production code <a href="#b1192f76c2ef4f31b6bddcbc944664c7">#</a>
</h3>
<p>
The few times I've now done the kata, there's almost no 'production code'. The implied <code>CommandService</code> has two lines of effective code:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">CommandService</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IWriteRegistry writeRegistry;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">CommandService</span>(IWriteRegistry <span style="font-weight:bold;color:#1f377f;">writeRegistry</span>, IReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.writeRegistry = writeRegistry;
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BookARoom</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
writeRegistry.Save(booking);
readRegistry.RoomBooked(booking);
}
}</pre>
</p>
<p>
The <code>QueryService</code> class isn't much more exciting:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">QueryService</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">QueryService</span>(IReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<Room> <span style="color:#74531f;">Reserve</span>(
Booking <span style="font-weight:bold;color:#1f377f;">booking</span>,
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">existingView</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> existingView.Where(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name != booking.RoomName).ToList();
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> readRegistry.GetFreeRooms(arrival, departure);
}
}</pre>
</p>
<p>
The kata only suggests the <code>GetFreeRooms</code> method, which is only a single line. The only reason the <code>Reserve</code> function also exists is to pull a bit of testable logic back from the below <a href="http://xunitpatterns.com/Fake%20Object.html">Fake object</a>. I'll return to that shortly.
</p>
<p>
I've also done the exercise in <a href="https://fsharp.org/">F#</a>, essentially porting the C# implementation, which only highlights how simple it all is:
</p>
<p>
<pre><span style="color:blue;">module</span> CommandService =
<span style="color:blue;">let</span> bookARoom (writeRegistry : IWriteRegistry) (readRegistry : IReadRegistry) booking =
writeRegistry.Save booking
readRegistry.RoomBooked booking
<span style="color:blue;">module</span> QueryService =
<span style="color:blue;">let</span> reserve booking existingView =
existingView |> Seq.filter (<span style="color:blue;">fun</span> r <span style="color:blue;">-></span> r.Name <> booking.RoomName)
<span style="color:blue;">let</span> getFreeRooms (readRegistry : IReadRegistry) arrival departure =
readRegistry.GetFreeRooms arrival departure</pre>
</p>
<p>
That's <em>both</em> the Command side and the Query side!
</p>
<p>
This represents my honest interpretation of the kata. Really, there's nothing to it.
</p>
<p>
The reason I still find the exercise interesting is that it explores other aspects of TDD than most katas. The most common katas require you to write a little algorithm: <a href="https://codingdojo.org/kata/Bowling/">Bowling</a>, <a href="https://codingdojo.org/kata/WordWrap/">Word wrap</a>, <a href="https://codingdojo.org/kata/RomanNumerals/">Roman Numerals</a>, <a href="https://codingdojo.org/kata/Diamond/">Diamond</a>, <a href="https://codingdojo.org/kata/Tennis/">Tennis</a>, etc.
</p>
<p>
The CQRS Booking kata suggests no interesting algorithm, but rather teaches some important lessons about software architecture, separation of concerns, and, if you approach it with TDD, real-world test automation. In contrast to all those algorithmic exercises, this one strongly suggests the use of <a href="http://xunitpatterns.com/Test%20Double.html">Test Doubles</a>.
</p>
<h3 id="6d8d7717cfef428e91418b319c4fe971">
Fakes <a href="#6d8d7717cfef428e91418b319c4fe971">#</a>
</h3>
<p>
You could attempt the kata with a dynamic 'mocking' library such as <a href="https://devlooped.com/moq">Moq</a> or <a href="https://site.mockito.org/">Mockito</a>, but I haven't tried. Since <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">Stubs and Mocks break encapsulation</a> I favour Fake Objects instead.
</p>
<p>
Creating a Fake write registry is trivial:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeWriteRegistry</span> : Collection<Booking>, IWriteRegistry
{
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Save</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
Add(booking);
}
}</pre>
</p>
<p>
Its counterpart, the Fake read registry, turns out to be much more involved:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeReadRegistry</span> : IReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeReadRegistry</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
views[d] = newView;
}
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms;
}
}</pre>
</p>
<p>
I think I can predict the most common reaction: <em>That's much more code than the System Under Test!</em> Indeed. For this particular exercise, this may indicate that a 'dynamic mock' library may have been a better choice. I do, however, also think that it's an artefact of the kata description's lack of requirements.
</p>
<p>
As is evident from the restaurant sample code that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, once you add <a href="/2020/01/27/the-maitre-d-kata">realistic business rules</a> the production code grows, and the ratio of test code to production code becomes better balanced.
</p>
<p>
The size of the <code>FakeReadRegistry</code> class also stems from the way the .NET base class library API is designed. The <code>GetView</code> helper method demonstrates that it requires four lines of code to look up an entry in a dictionary but return a default value if the entry isn't found. That's a one-liner in F#:
</p>
<p>
<pre><span style="color:blue;">let</span> getView (date : DateOnly) = views |> Map.tryFind date |> Option.defaultValue rooms |> Set.ofSeq</pre>
</p>
<p>
I'll show the entire F# Fake later, but you could also play some <a href="/2023/11/14/cc-golf">CC golf</a> with the C# code. That's a bit besides the point, though.
</p>
<h3 id="1f5b34534edd4b72947c8d6b4c8921bf">
Command service design <a href="#1f5b34534edd4b72947c8d6b4c8921bf">#</a>
</h3>
<p>
Why does <code>FakeReadRegistry</code> look like it does? It's a combination of the kata description and my prior experience with CQRS. When adopting an asynchronous message-based architecture, I would usually not implement the write side exactly like that. Notice how the <code>CommandService</code> class' <code>BookARoom</code> method seems to repeat itself:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BookARoom</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
writeRegistry.Save(booking);
readRegistry.RoomBooked(booking);
}</pre>
</p>
<p>
While semantically it seems to be making two different statements, structurally they're identical. If you rename the methods, you could wrap both method calls in a single <a href="https://en.wikipedia.org/wiki/Composite_pattern">Composite</a>. In a more typical CQRS architecture, you'd post a Command on bus:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BookARoom</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
bus.BookRoom(booking);
}</pre>
</p>
<p>
This makes that particular <code>BookARoom</code> method, and perhaps the entire <code>CommandService</code> class, look redundant. Why do we need it?
</p>
<p>
As presented here, we don't, but in a real application, the Command service would likely perform some pre- and post-processing. For example, if this was a web application, the Command service might instead be a Controller concerned with validating and translating HTTP- or Web-based input to a Domain Object before posting to the bus.
</p>
<p>
A realistic code base would also be asynchronous, which, on .NET, would imply the use of the <code>async</code> and <code>await</code> keywords, etc.
</p>
<h3 id="4dd49e22fe4c479381285eb1b886457e">
Read registry design <a href="#4dd49e22fe4c479381285eb1b886457e">#</a>
</h3>
<p>
A central point of CQRS is that you can optimise the read side for the specific tasks that it needs to perform. Instead of performing a dynamic query every time a client requests a view, you can update and persist a view. Imagine having a JSON or HTML file that the system can serve upon request.
</p>
<p>
Part of handling a Command or Event is that the system background processes update persistent views once per event.
</p>
<p>
For the particular hotel booking system, I imagine that the read registry has a set of files, blobs, documents, or denormalised database rows. When it receives notification of a booking, it'll need to remove that room from the dates of the booking.
</p>
<p>
While a booking may stretch over several days, I found it simplest to think of the storage system as subdivided into single dates, instead of ranges. Indeed, the <code>GetFreeRooms</code> method is a ranged query, so if you really wanted to denormalise the views, you could create a persistent view per range. This would, however, require that you precalculate and persist a view for October 2 to October 4, and another one for October 2 to October 5, and so on. The combinatorial explosion suggests that this isn't a good idea, so instead I imagine keeping a persistent view per date, and then perform a bit of on-the-fly calculation per query.
</p>
<p>
That's what <code>FakeReadRegistry</code> does. It also falls back to a default collection of <code>rooms</code> for all the dates that are yet untouched by a booking. This is, again, because I imagine that I might implement a real system like that.
</p>
<p>
You may still protest that the <code>FakeReadRegistry</code> duplicates production code. True, perhaps, but if this really is a concern, you could <a href="/2023/11/20/trimming-a-fake-object">refactor it to the Template Method pattern</a>.
</p>
<p>
Still, it's not really that complicated; it only looks that way because C# and the Dictionary API is too heavy on <a href="/2019/12/16/zone-of-ceremony">ceremony</a>. The Fake looks much simpler in F#:
</p>
<p>
<pre><span style="color:blue;">type</span> FakeReadRegistry (rooms : IReadOnlyCollection<Room>) =
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> views = Map.empty
<span style="color:blue;">let</span> enumerateDates (arrival : DateOnly) departure =
Seq.initInfinite id
|> Seq.map arrival.AddDays
|> Seq.takeWhile (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span> d < departure)
<span style="color:blue;">let</span> getView (date : DateOnly) =
views |> Map.tryFind date |> Option.defaultValue rooms |> Set.ofSeq
<span style="color:blue;">interface</span> IReadRegistry <span style="color:blue;">with</span>
<span style="color:blue;">member</span> this.GetFreeRooms arrival departure =
enumerateDates arrival departure
|> Seq.map getView
|> Seq.fold Set.intersect (Set.ofSeq rooms)
|> Set.toList :> _
<span style="color:blue;">member</span> this.RoomBooked booking =
<span style="color:blue;">for</span> d <span style="color:blue;">in</span> enumerateDates booking.Arrival booking.Departure <span style="color:blue;">do</span>
<span style="color:blue;">let</span> newView = getView d |> QueryService.reserve booking |> Seq.toList
views <span style="color:blue;"><-</span> Map.add d newView views
</pre>
</p>
<p>
This isn't just more dense than the corresponding C# code, as F# tends to be, it also has a lower <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>. Both the <code>EnumerateDates</code> and <code>GetView</code> C# methods have a cyclomatic complexity of <em>2</em>, while their F# counterparts rate only <em>1</em>.
</p>
<p>
For production code, cyclomatic complexity of <em>2</em> is fine if the code is covered by automatic tests. In test code, however, we should be wary of any branching or looping, since there are (typically) no tests of the test code.
</p>
<p>
While I <em>am</em> going to show some tests of that code in what follows, I do that for a different reason.
</p>
<h3 id="da4d51ff041c4cf6b4007d53a67f2d76">
Contract <a href="#da4d51ff041c4cf6b4007d53a67f2d76">#</a>
</h3>
<p>
When explaining Fake Objects to people, I've begun to use a particular phrase:
</p>
<blockquote>
<p>
A Fake Object is a polymorphic implementation of a dependency that fulfils the contract, but lacks some of the <em>ilities</em>.
</p>
</blockquote>
<p>
It's funny how you can arrive at something that strikes you as profound, only to discover that it was part of the definition all along:
</p>
<blockquote>
<p>
"We acquire or build a very lightweight implementation of the same functionality as provided by a component on which the SUT [System Under Test] depends and instruct the SUT to use it instead of the real DOC [Depended-On Component]. This implementation need not have any of the "-ilities" that the real DOC needs to have"
</p>
<footer><cite>Gerard Meszaros, <a href="/ref/xunit-patterns">xUnit Test Patterns</a></cite></footer>
</blockquote>
<p>
A common example is a Fake Repository object that pretends to be a database, often by leveraging a built-in collection API. The above <code>FakeWriteRegistry</code> is as simple an example as you could have. A slightly more compelling example is <a href="/2023/08/14/replacing-mock-and-stub-with-a-fake">the FakeUserRepository shown in another article</a>. Such an 'in-memory database' fulfils the implied contract, because if you 'save' something in the 'database' you can later retrieve it again with a query. As long as the object remains in memory.
</p>
<p>
The <em>ilities</em> that such a Fake database lacks are
</p>
<ul>
<li>data persistence</li>
<li>thread safety</li>
<li>transaction support</li>
</ul>
<p>
and perhaps others. Such qualities are clearly required in a real production environment, but are in the way in an automated testing context. The implied contract, however, is satisfied: What you save you can later retrieve.
</p>
<p>
Now consider the <code>IReadRegistry</code> interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IReadRegistry</span>
{
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>);
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>);
}</pre>
</p>
<p>
Which contract does it imply, given what you know about the <em>CQRS Booking</em> kata?
</p>
<p>
I would suggest the following:
</p>
<ul>
<li><em>Precondition:</em> <code>arrival</code> should be less than (or equal?) to <code>departure</code>.</li>
<li><em>Postcondition:</em> <code>GetFreeRooms</code> should always return a result. Null isn't a valid return value.</li>
<li><em>Invariant:</em> After calling <code>RoomBooked</code>, <code>GetFreeRooms</code> should exclude that room when queried on overlapping dates.</li>
</ul>
<p>
There may be other parts of the contract than this, but I find the third one most interesting. This is exactly what you would expect from a real system: If you reserve a room, you'd be surprised to see <code>GetFreeRooms</code> indicating that this room is free if queried about dates that overlap the reservation.
</p>
<p>
This is the sort of implied interaction that <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">Stubs and Mocks break</a>, but that <code>FakeReadRegistry</code> guarantees.
</p>
<h3 id="6ab8206598ab4bb990d93e5472d36054">
Properties <a href="#6ab8206598ab4bb990d93e5472d36054">#</a>
</h3>
<p>
There's a close relationship between contracts and properties. Once you can list preconditions, invariants, and postconditions for an object, there's a good chance that you can write code that exercises those qualities. Indeed, why not use property-based testing to do so?
</p>
<p>
I don't wish to imply that you should (normally) write tests of your test code. The following rather serves as a concretisation of the notion that a Fake Object is a Test Double that implements the 'proper' behaviour. In the following, I'll subject the <code>FakeReadRegistry</code> class to that exercise. To do that, I'll use <a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a> 2.14.1 with <a href="https://xunit.net/">xUnit.net</a> 2.5.3.
</p>
<p>
Before tackling the above invariant, there's a simpler invariant specific to the <code>FakeReadRegistry</code> class. A <code>FakeReadRegistry</code> object takes a collection of <code>rooms</code> via its constructor, so for this particular implementation, we may wish to establish the reasonable invariant that <code>GetFreeRooms</code> doesn't 'invent' rooms on its own:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Gen<Room> GenRoom =>
<span style="color:blue;">from</span> name <span style="color:blue;">in</span> Gen.String
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Room(name);
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>()
{
(<span style="color:blue;">from</span> rooms <span style="color:blue;">in</span> GenRoom.ArrayUnique
<span style="color:blue;">from</span> arrival <span style="color:blue;">in</span> Gen.Date.Select(DateOnly.FromDateTime)
<span style="color:blue;">from</span> i <span style="color:blue;">in</span> Gen.Int[1, 1_000]
<span style="color:blue;">let</span> departure = arrival.AddDays(i)
<span style="color:blue;">select</span> (rooms, arrival, departure))
.Sample((<span style="font-weight:bold;color:#1f377f;">rooms</span>, <span style="font-weight:bold;color:#1f377f;">arrival</span>, <span style="font-weight:bold;color:#1f377f;">departure</span>) =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> FakeReadRegistry(rooms);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetFreeRooms(arrival, departure);
Assert.Subset(<span style="color:blue;">new</span> HashSet<Room>(rooms), <span style="color:blue;">new</span> HashSet<Room>(actual));
});
}</pre>
</p>
<p>
This property asserts that the <code>actual</code> value returned from <code>GetFreeRooms</code> is a subset of the <code>rooms</code> used to initialise the <code>sut</code>. Recall that the subset relation is <a href="https://en.wikipedia.org/wiki/Reflexive_relation">reflexive</a>; i.e. a set is a subset of itself.
</p>
<p>
The same property written in F# with <a href="https://hedgehog.qa/">Hedgehog</a> 0.13.0 and <a href="https://github.com/SwensenSoftware/unquote">Unquote</a> 6.1.0 may look like this:
</p>
<p>
<pre><span style="color:blue;">module</span> Gen =
<span style="color:blue;">let</span> room =
Gen.alphaNum
|> Gen.array (Range.linear 1 10)
|> Gen.map (<span style="color:blue;">fun</span> chars <span style="color:blue;">-></span> { Name = String chars })
<span style="color:blue;">let</span> dateOnly =
<span style="color:blue;">let</span> min = DateOnly(2000, 1, 1).DayNumber
<span style="color:blue;">let</span> max = DateOnly(2100, 1, 1).DayNumber
Range.linear min max |> Gen.int32 |> Gen.map DateOnly.FromDayNumber
[<Fact>]
<span style="color:blue;">let</span> GetFreeRooms () = Property.check <| property {
<span style="color:blue;">let!</span> rooms = Gen.room |> Gen.list (Range.linear 0 100)
<span style="color:blue;">let!</span> arrival = Gen.dateOnly
<span style="color:blue;">let!</span> i = Gen.int32 (Range.linear 1 1_000)
<span style="color:blue;">let</span> departure = arrival.AddDays i
<span style="color:blue;">let</span> sut = FakeReadRegistry rooms :> IReadRegistry
<span style="color:blue;">let</span> actual = sut.GetFreeRooms arrival departure
test <@ Set.isSubset (Set.ofSeq rooms) (Set.ofSeq actual) @> }</pre>
</p>
<p>
Simpler syntax, same idea.
</p>
<p>
Likewise, we can express the contract that describes the relationship between <code>RoomBooked</code> and <code>GetFreeRooms</code> like this:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>()
{
(<span style="color:blue;">from</span> rooms <span style="color:blue;">in</span> GenRoom.ArrayUnique.Nonempty
<span style="color:blue;">from</span> arrival <span style="color:blue;">in</span> Gen.Date.Select(DateOnly.FromDateTime)
<span style="color:blue;">from</span> i <span style="color:blue;">in</span> Gen.Int[1, 1_000]
<span style="color:blue;">let</span> departure = arrival.AddDays(i)
<span style="color:blue;">from</span> room <span style="color:blue;">in</span> Gen.OneOfConst(rooms)
<span style="color:blue;">from</span> id <span style="color:blue;">in</span> Gen.Guid
<span style="color:blue;">let</span> booking = <span style="color:blue;">new</span> Booking(id, room.Name, arrival, departure)
<span style="color:blue;">select</span> (rooms, booking))
.Sample((<span style="font-weight:bold;color:#1f377f;">rooms</span>, <span style="font-weight:bold;color:#1f377f;">booking</span>) =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> FakeReadRegistry(rooms);
sut.RoomBooked(booking);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetFreeRooms(booking.Arrival, booking.Departure);
Assert.DoesNotContain(booking.RoomName, actual.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name));
});
}</pre>
</p>
<p>
or, in F#:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> RoomBooked () = Property.check <| property {
<span style="color:blue;">let!</span> rooms = Gen.room |> Gen.list (Range.linear 1 100)
<span style="color:blue;">let!</span> arrival = Gen.dateOnly
<span style="color:blue;">let!</span> i = Gen.int32 (Range.linear 1 1_000)
<span style="color:blue;">let</span> departure = arrival.AddDays i
<span style="color:blue;">let!</span> room = Gen.item rooms
<span style="color:blue;">let!</span> id = Gen.guid
<span style="color:blue;">let</span> booking = {
ClientId = id
RoomName = room.Name
Arrival = arrival
Departure = departure }
<span style="color:blue;">let</span> sut = FakeReadRegistry rooms :> IReadRegistry
sut.RoomBooked booking
<span style="color:blue;">let</span> actual = sut.GetFreeRooms arrival departure
test <@ not (Seq.contains room actual) @> }</pre>
</p>
<p>
In both cases, the property books a room and then proceeds to query <code>GetFreeRooms</code> to see which rooms are free. Since the query is exactly in the range from <code>booking.Arrival</code> to <code>booking.Departure</code>, we expect <em>not</em> to see the name of the booked room among the free rooms.
</p>
<p>
(As I'm writing this, I think that there may be a subtle bug in the F# property. Can you spot it?)
</p>
<h3 id="fa249347697b49699b7ea62336746651">
Conclusion <a href="#fa249347697b49699b7ea62336746651">#</a>
</h3>
<p>
A Fake Object isn't like other Test Doubles. While <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">Stubs and Mocks break encapsulation</a>, a Fake Object not only stays encapsulated, but it also fulfils the contract implied by a polymorphic API (interface or base class).
</p>
<p>
Or, put another way: When is a Fake Object the right Test Double? When you can describe the contract of the dependency.
</p>
<p>
But if you <em>can't</em> describe the contract of a dependency, you should seriously consider if the design is right.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
A C# port of validation with partial round trip
https://blog.ploeh.dk/2023/10/30/a-c-port-of-validation-with-partial-round-trip
2023-10-30T11:52:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A raw port of the previous F# demo code.</em>
</p>
<p>
This article is part of <a href="/2020/12/14/validation-a-solved-problem">a short article series</a> on <a href="/2018/11/05/applicative-validation">applicative validation</a> with a twist. The twist is that validation, when it fails, should return not only a list of error messages; it should also retain that part of the input that <em>was</em> valid.
</p>
<p>
In the <a href="/2020/12/28/an-f-demo-of-validation-with-partial-data-round-trip">previous article</a> I showed <a href="https://fsharp.org/">F#</a> demo code, and since <a href="https://forums.fsharp.org/t/thoughts-on-input-validation-pattern-from-a-noob/1541">the original forum question</a> that prompted the article series was about F# code, for a long time, I left it there.
</p>
<p>
Recently, however, I've found myself writing about validation in a broader context:
</p>
<ul>
<li><a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a></li>
<li><a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a></li>
<li><a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a></li>
<li><a href="/2023/06/26/validation-and-business-rules">Validation and business rules</a></li>
<li><a href="/2023/07/03/validating-or-verifying-emails">Validating or verifying emails</a></li>
</ul>
<p>
Perhaps I should consider adding a <em>validation</em> tag to the blog...
</p>
<p>
In that light I thought that it might be illustrative to continue <a href="/2020/12/14/validation-a-solved-problem">this article series</a> with a port to C#.
</p>
<p>
Here, I use techniques already described on this site to perform the translation. Follow the links for details.
</p>
<p>
The translation given here is direct so produces some fairly non-idiomatic C# code.
</p>
<h3 id="5cee653b6148484fb782d92fea2ca415">
Building blocks <a href="#5cee653b6148484fb782d92fea2ca415">#</a>
</h3>
<p>
The original problem is succinctly stated, and I follow it as closely as possible. This includes potential errors that may be present in the original post.
</p>
<p>
The task is to translate some input to a Domain Model with <a href="/2022/10/24/encapsulation-in-functional-programming">good encapsulation</a>. The input type looks like this, translated to a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">C# record</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Input</span>(<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">Name</span>, DateTime? <span style="font-weight:bold;color:#1f377f;">DoB</span>, <span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">Address</span>)</pre>
</p>
<p>
Notice that every input may be null. This indicates poor encapsulation, but is symptomatic of most input. <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">At the boundaries, static types are illusory</a>. Perhaps it would have been more idiomatic to model such input as a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a>, but it makes little difference to what comes next.
</p>
<p>
I consider <a href="/2020/12/14/validation-a-solved-problem">validation a solved problem</a>, because it's possible to model the process as an <a href="/2018/10/01/applicative-functors">applicative functor</a>. Really, <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">validation is a parsing problem</a>.
</p>
<p>
Since my main intent with this article is to demonstrate a technique, I will allow myself a few shortcuts. Like I did <a href="/2023/08/28/a-first-crack-at-the-args-kata">when I first encountered the Args kata</a>, I start by copying the <code>Validated</code> code from <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a>; you can go there if you're interested in it. I'm not going to repeat it here.
</p>
<p>
The target type looks similar to the above <code>Input</code> record, but doesn't allow null values:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">ValidInput</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">Name</span>, DateTime <span style="font-weight:bold;color:#1f377f;">DoB</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">Address</span>);</pre>
</p>
<p>
This could also have been a 'proper' class. The following code doesn't depend on that.
</p>
<h3 id="7af5ab9c8fca4dc193fc5854c2806ff4">
Validating names <a href="#7af5ab9c8fca4dc193fc5854c2806ff4">#</a>
</h3>
<p>
Since I'm now working in an ostensibly object-oriented language, I can make the various validation functions methods on the <code>Input</code> record. Since I'm treating validation as a parsing problem, I'm going to name those methods with the <code>TryParse</code> prefix:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>
<span style="font-weight:bold;color:#74531f;">TryParseName</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (Name <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"name is required"</span> }));
<span style="font-weight:bold;color:#8f08c4;">if</span> (Name.Length <= 3)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
(<span style="font-weight:bold;color:#1f377f;">i</span> => i <span style="color:blue;">with</span> { Name = <span style="color:blue;">null</span> }, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"no bob and toms allowed"</span> }));
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Succeed<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(Name);
}</pre>
</p>
<p>
As the two previous articles have explained, the result of trying to parse input is a type isomorphic to <a href="/2019/01/14/an-either-functor">Either</a>, but here called <code><span style="color:#2b91af;">Validated</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>></code>. (The reason for this distinction is that we <em>don't</em> want the <a href="/2022/05/09/an-either-monad">monadic behaviour of Either</a>, because monads short-circuit.)
</p>
<p>
When parsing succeeds, the <code>TryParseName</code> method returns the <code>Name</code> wrapped in a <code>Success</code> case.
</p>
<p>
Parsing the name may fail in two different ways. If the name is missing, the method returns the input and the error message <em>"name is required"</em>. If the name is present, but too short, <code>TryParseName</code> returns another error message, and also resets <code>Name</code> to <code>null</code>.
</p>
<p>
Compare the C# code with <a href="/2020/12/28/an-f-demo-of-validation-with-partial-data-round-trip">the corresponding F#</a> or <a href="/2020/12/21/a-haskell-proof-of-concept-of-validation-with-partial-data-round-trip">Haskell code</a> and notice how much more verbose the C# has to be.
</p>
<p>
While it's possible to translate many functional programming concepts to a language like C#, syntax does matter, because it affects readability.
</p>
<h3 id="2113a955061341ab9e2dba711aaf8457">
Validating date of birth <a href="#2113a955061341ab9e2dba711aaf8457">#</a>
</h3>
<p>
From here, the port is direct, if awkward. Here's how to validate the date-of-birth field:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>
<span style="font-weight:bold;color:#74531f;">TryParseDoB</span>(DateTime <span style="font-weight:bold;color:#1f377f;">now</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!DoB.HasValue)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>(
(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"dob is required"</span> }));
<span style="font-weight:bold;color:#8f08c4;">if</span> (DoB.Value <= now.AddYears(-12))
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>(
(<span style="font-weight:bold;color:#1f377f;">i</span> => i <span style="color:blue;">with</span> { DoB = <span style="color:blue;">null</span> }, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"get off my lawn"</span> }));
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Succeed<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>(
DoB.Value);
}</pre>
</p>
<p>
I suspect that the age check should really have been a greater-than relation, but I'm only reproducing the original code.
</p>
<h3 id="e1fc6b98e4fb4dad81ee5e354032acb8">
Validating addresses <a href="#e1fc6b98e4fb4dad81ee5e354032acb8">#</a>
</h3>
<p>
The final building block is to parse the input address:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>
<span style="font-weight:bold;color:#74531f;">TryParseAddress</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (Address <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span> }));
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Succeed<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
Address);
}</pre>
</p>
<p>
The <code>TryParseAddress</code> only checks whether or not the <code>Address</code> field is present.
</p>
<h3 id="b11153a62fa945568e880cf771a7cb19">
Composition <a href="#b11153a62fa945568e880cf771a7cb19">#</a>
</h3>
<p>
The above methods are <code>private</code> because the entire problem is simple enough that I can test the composition as a whole. Had I wanted to, however, I could easily have made them <code>public</code> and tested them individually.
</p>
<p>
You can now use applicative composition to produce a single validation method:
</p>
<p>
<pre><span style="color:blue;">public</span> Validated<(Input, IReadOnlyCollection<<span style="color:blue;">string</span>>), ValidInput>
<span style="font-weight:bold;color:#74531f;">TryParse</span>(DateTime <span style="font-weight:bold;color:#1f377f;">now</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">name</span> = TryParseName();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dob</span> = TryParseDoB(now);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">address</span> = TryParseAddress();
Func<<span style="color:blue;">string</span>, DateTime, <span style="color:blue;">string</span>, ValidInput> <span style="font-weight:bold;color:#1f377f;">createValid</span> =
(<span style="font-weight:bold;color:#1f377f;">n</span>, <span style="font-weight:bold;color:#1f377f;">d</span>, <span style="font-weight:bold;color:#1f377f;">a</span>) => <span style="color:blue;">new</span> ValidInput(n, d, a);
<span style="color:blue;">static</span> (Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>) <span style="color:#74531f;">combineErrors</span>(
(Func<Input, Input> f, IReadOnlyCollection<<span style="color:blue;">string</span>> es) <span style="font-weight:bold;color:#1f377f;">x</span>,
(Func<Input, Input> g, IReadOnlyCollection<<span style="color:blue;">string</span>> es) <span style="font-weight:bold;color:#1f377f;">y</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> (<span style="font-weight:bold;color:#1f377f;">z</span> => y.g(x.f(z)), y.es.Concat(x.es).ToArray());
}
<span style="font-weight:bold;color:#8f08c4;">return</span> createValid
.Apply(name, combineErrors)
.Apply(dob, combineErrors)
.Apply(address, combineErrors)
.SelectFailure(<span style="font-weight:bold;color:#1f377f;">x</span> => (x.Item1(<span style="color:blue;">this</span>), x.Item2));
}</pre>
</p>
<p>
This is where the <code>Validated</code> API is still awkward. You need to explicitly define a function to compose error cases. In this case, <code>combineErrors</code> composes the <a href="/2017/11/13/endomorphism-monoid">endomorphisms</a> and concatenates the collections.
</p>
<p>
The final step 'runs' the endomorphism. <code>x.Item1</code> is the endomorphism, and <code>this</code> is the <code>Input</code> value being validated. Again, this isn't readable in C#, but it's where the endomorphism removes the invalid values from the input.
</p>
<h3 id="8aa59e20c1924002ae0d4e951df71619">
Tests <a href="#8aa59e20c1924002ae0d4e951df71619">#</a>
</h3>
<p>
Since <a href="/2018/11/05/applicative-validation">applicative validation</a> is a functional technique, it's <a href="/2015/05/07/functional-design-is-intrinsically-testable">intrinsically testable</a>.
</p>
<p>
Testing a successful validation is as easy as this:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ValidationSucceeds</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = DateTime.Now;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">eightYearsAgo</span> = now.AddYears(-8);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">input</span> = <span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = input.TryParse(now);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = Validated.Succeed<(Input, IReadOnlyCollection<<span style="color:blue;">string</span>>), ValidInput>(
<span style="color:blue;">new</span> ValidInput(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>));
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
As is often the case, the error conditions are more numerous, or more interesting, if you will, than the success case, so this requires a parametrised test:
</p>
<p>
<pre>[Theory, ClassData(<span style="color:blue;">typeof</span>(ValidationFailureTestCases))]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ValidationFails</span>(
Input <span style="font-weight:bold;color:#1f377f;">input</span>,
Input <span style="font-weight:bold;color:#1f377f;">expected</span>,
IReadOnlyCollection<<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#1f377f;">expectedMessages</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = DateTime.Now;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = input.TryParse(now);
var (<span style="font-weight:bold;color:#1f377f;">inp</span>, <span style="font-weight:bold;color:#1f377f;">msgs</span>) = Assert.Single(actual.Match(
onFailure: <span style="font-weight:bold;color:#1f377f;">x</span> => <span style="color:blue;">new</span>[] { x },
onSuccess: <span style="font-weight:bold;color:#1f377f;">_</span> => Array.Empty<(Input, IReadOnlyCollection<<span style="color:blue;">string</span>>)>()));
Assert.Equal(expected, inp);
Assert.Equal(expectedMessages, msgs);
}</pre>
</p>
<p>
I also had to take <code>actual</code> apart in order to inspects its individual elements. When working with a pure and immutable data structure, I consider that a test smell. Rather, one should be able to use <a href="/2021/05/03/structural-equality-for-better-tests">structural equality for better tests</a>. Unfortunately, .NET collections don't have structural equality, so the test has to pull the message collection out of <code>actual</code> in order to verify it.
</p>
<p>
Again, in F# or <a href="https://www.haskell.org/">Haskell</a> you don't have that problem, and the tests are much more succinct and robust.
</p>
<p>
The test cases are implemented by this nested <code>ValidationFailureTestCases</code> class:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ValidationFailureTestCases</span> :
TheoryData<Input, Input, IReadOnlyCollection<<span style="color:blue;">string</span>>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">ValidationFailureTestCases</span>()
{
Add(<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span>, <span style="color:#a31515;">"dob is required"</span>, <span style="color:#a31515;">"name is required"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Bob"</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span>, <span style="color:#a31515;">"dob is required"</span>, <span style="color:#a31515;">"no bob and toms allowed"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span>, <span style="color:#a31515;">"dob is required"</span> });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">eightYearsAgo</span> = DateTime.Now.AddYears(-8);
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span> });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">fortyYearsAgo</span> = DateTime.Now.AddYears(-40);
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, fortyYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, <span style="color:blue;">null</span>, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"get off my lawn"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Tom"</span>, fortyYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"get off my lawn"</span>, <span style="color:#a31515;">"no bob and toms allowed"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Tom"</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"no bob and toms allowed"</span> });
}
}</pre>
</p>
<p>
All eight tests pass.
</p>
<h3 id="96596b52720f4a2688c216701f48d559">
Conclusion <a href="#96596b52720f4a2688c216701f48d559">#</a>
</h3>
<p>
Once you know <a href="/2018/05/22/church-encoding">how to model sum types (discriminated unions) in C#</a>, translating something like applicative validation isn't difficult per se. It's a fairly automatic process.
</p>
<p>
The code is hardly <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C#, and the type annotations are particularly annoying. Things work as expected though, and it isn't difficult to imagine how one could refactor some of this code to a more idiomatic form.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Domain Model first
https://blog.ploeh.dk/2023/10/23/domain-model-first
2023-10-23T06:09:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Persistence concerns second.</em>
</p>
<p>
A few weeks ago, I published an article with the title <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">Do ORMs reduce the need for mapping?</a> Not surprisingly, this elicited more than one reaction. In this article, I'll respond to a particular kind of reaction.
</p>
<p>
First, however, I'd like to reiterate the message of the previous article, which is almost revealed by the title: <em>Do <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a> (ORMs) reduce the need for mapping?</em> To which the article answers a tentative <em>no</em>.
</p>
<p>
Do pay attention to the question. It doesn't ask whether ORMs are bad in general, or in all cases. It mainly analyses whether the use of ORMs reduces the need to write code that maps between different representations of data: From database to objects, from objects to <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Objects</a> (DTOs), etc.
</p>
<p>
Granted, the article looks at a wider context, which I think is only a responsible thing to do. This could lead some readers to extrapolate from the article's specific focus to draw a wider conclusion.
</p>
<h3 id="951d538881fd4464a081ba3cd09162b0">
Encapsulation-first <a href="#951d538881fd4464a081ba3cd09162b0">#</a>
</h3>
<p>
Most of the systems I work with aren't <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a> systems, but rather systems where correctness is important. As an example, one of my clients does security-heavy digital infrastructure. Earlier in my career, I helped write web shops when these kinds of systems were new. Let me tell you: System owners were quite concerned that prices were correct, and that orders were taken and handled without error.
</p>
<p>
In my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> I've tried to capture the essence of those kinds of system with the accompanying sample code, which pretends to be an online restaurant reservation system. While this may sound like a trivial CRUD system, <a href="/2020/01/27/the-maitre-d-kata">the business logic isn't entirely straightforward</a>.
</p>
<p>
The point I was making in <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">the previous article</a> is that I consider <a href="/encapsulation-and-solid">encapsulation</a> to be more important than 'easy' persistence. I don't mind writing a bit of mapping code, since <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing isn't a programming bottleneck</a> anyway.
</p>
<p>
When prioritising encapsulation you should be able to make use of any design pattern, run-time assertion, as well as static type systems (if you're working in such a language) to guard correctness. You should be able to compose objects, define <a href="https://en.wikipedia.org/wiki/Value_object">Value Objects</a>, <a href="/2015/01/19/from-primitive-obsession-to-domain-modelling">wrap single values to avoid primitive obsession</a>, make constructors private, leverage polymorphism and effectively use any trick your language, <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiom</a>, and platform has on offer. If you want to use <a href="/2018/05/22/church-encoding">Church encoding</a> or the <a href="/2018/06/25/visitor-as-a-sum-type">Visitor pattern to represent a sum type</a>, you should be able to do that.
</p>
<p>
When writing these kinds of systems, I start with the Domain Model without any thought of how to persist or retrieve data.
</p>
<p>
In my experience, once the Domain Model starts to congeal, the persistence question tends to answer itself. There's usually one or two obvious ways to store and read data.
</p>
<p>
Usually, a relational database isn't the most obvious choice.
</p>
<h3 id="1b562dd9077e4b27b912d782bdca14fb">
Persistence ignorance <a href="#1b562dd9077e4b27b912d782bdca14fb">#</a>
</h3>
<p>
Write the best API you can to solve the problem, and then figure out how to store data. This is the allegedly elusive ideal of <em>persistence ignorance</em>, which turns out to be easier than rumour has it, once you cast a wider net than relational databases.
</p>
<p>
It seems to me, though, that more than one person who has commented on my previous article have a hard time considering alternatives. And granted, I've consulted with clients who knew how to operate a particular database system, but nothing else, and who didn't want to consider adopting another technology. I do understand that such constraints are real, too. Thus, if you need to compromise for reasons such as these, you aren't doing anything wrong. You may still, however, try to get the best out of the situation.
</p>
<p>
One client of mine, for example, didn't want to operate anything else than <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a>, which they already know. For an asynchronous message-based system, then, we chose <a href="https://particular.net/nservicebus">NServiceBus</a> and configured it to use SQL Server as a persistent queue.
</p>
<p>
Several comments still seem to assume that persistence must look in a particular way.
</p>
<blockquote>
<p>
"So having a Order, OrderLine, Person, Address and City, all the rows needed to be loaded in advance, mapped to objects and references set to create the object graph to be able to, say, display shipping costs based on person's address."
</p>
<footer><cite><a href="/2023/09/18/do-orms-reduce-the-need-for-mapping#75ca5755d2a4445ba4836fc3f6922a5c">Vlad</a></cite></footer>
</blockquote>
<p>
I don't wish to single out Vlad, but this is both the first comment, and it captures the essence of other comments well. I imagine that what he has in mind is something like this:
</p>
<p>
<img src="/content/binary/orders-db-diagram.png" alt="Database diagram with five tables: Orders, OrderLines, Persons, Addresses, and Cities.">
</p>
<p>
I've probably simplified things a bit too much. In a more realistic model, each person may have a collection of addresses, instead of just one. If so, it only strengthens Vlad's point, because that would imply even more tables to read.
</p>
<p>
The unstated assumption, however, is that a fully <a href="https://en.wikipedia.org/wiki/Database_normalization">normalised</a> relational data model is the correct way to store such data.
</p>
<p>
It's not. As I already mentioned, I spent the first four years of my programming career developing web shops. Orders were an integral part of that work.
</p>
<p>
An order is a <em>document</em>. You don't want the customer's address to be updatable after the fact. With a normalised relational model, if you change the customer's address row in the future, it's going to look as though the order went to that address instead of the address it actually went to.
</p>
<p>
This also explains why the order lines should <em>not</em> point to the actually product entries in the product catalogue. Trust me, I almost shipped such a system once, when I was young and inexperienced.
</p>
<p>
You should, at the very least, denormalise the database model. To a degree, this has already happened here, since the implied order has order lines, that, I hope, are copies of the relevant product data, rather than linked to the product catalogue.
</p>
<p>
Such insights, however, suggest that other storage mechanisms may be more appropriate.
</p>
<p>
Putting that aside for a moment, though, how would a persistence-ignorant Domain Model look?
</p>
<p>
I'd probably start with something like this:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">order</span> = <span style="color:blue;">new</span> Order(
<span style="color:blue;">new</span> Person(<span style="color:#a31515;">"Olive"</span>, <span style="color:#a31515;">"Hoyle"</span>,
<span style="color:blue;">new</span> Address(<span style="color:#a31515;">"Green Street 15"</span>, <span style="color:blue;">new</span> City(<span style="color:#a31515;">"Oakville"</span>), <span style="color:#a31515;">"90125"</span>)),
<span style="color:blue;">new</span> OrderLine(123, 1),
<span style="color:blue;">new</span> OrderLine(456, 3),
<span style="color:blue;">new</span> OrderLine(789, 2));</pre>
</p>
<p>
(As <a href="/ref/90125">the ZIP code</a> implies, I'm more of a <a href="https://en.wikipedia.org/wiki/Yes_(band)">Yes</a> fan, but still can't help but relish writing <code>new Order</code> in code.)
</p>
<p>
With code like this, many a <a href="/ref/ddd">DDD</a>'er would start talking about Aggregate Roots, but that is, frankly, a concept that never made much sense to me. Rather, the above <code>order</code> is a <a href="https://en.wikipedia.org/wiki/Tree_(graph_theory)">tree</a> composed of immutable data structures.
</p>
<p>
It trivially serializes to e.g. JSON:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"customer"</span>: {
<span style="color:#2e75b6;">"firstName"</span>: <span style="color:#a31515;">"Olive"</span>,
<span style="color:#2e75b6;">"lastName"</span>: <span style="color:#a31515;">"Hoyle"</span>,
<span style="color:#2e75b6;">"address"</span>: {
<span style="color:#2e75b6;">"street"</span>: <span style="color:#a31515;">"Green Street 15"</span>,
<span style="color:#2e75b6;">"city"</span>: { <span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Oakville"</span> },
<span style="color:#2e75b6;">"zipCode"</span>: <span style="color:#a31515;">"90125"</span>
}
},
<span style="color:#2e75b6;">"orderLines"</span>: [
{
<span style="color:#2e75b6;">"sku"</span>: 123,
<span style="color:#2e75b6;">"quantity"</span>: 1
},
{
<span style="color:#2e75b6;">"sku"</span>: 456,
<span style="color:#2e75b6;">"quantity"</span>: 3
},
{
<span style="color:#2e75b6;">"sku"</span>: 789,
<span style="color:#2e75b6;">"quantity"</span>: 2
}
]
}</pre>
</p>
<p>
All of this strongly suggests that this kind of data would be <em>much easier</em> to store and retrieve with a document database instead of a relational database.
</p>
<p>
While that's just one example, it strikes me as a common theme when discussing persistence. For most online transaction processing systems, relational database aren't necessarily the best fit.
</p>
<h3 id="3643959a545940f88001fb82297a286e">
The cart before the horse <a href="#3643959a545940f88001fb82297a286e">#</a>
</h3>
<p>
<a href="/2023/09/18/do-orms-reduce-the-need-for-mapping#359a7bb0d2c14b8eb2dcb2ac6de4897d">Another comment</a> also starts with the premise that a data model is fundamentally relational. This one purports to model the relationship between sheikhs, their wives, and supercars. While I understand that the example is supposed to be tongue-in-cheek, the comment launches straight into problems with how to read and persist such data without relying on an ORM.
</p>
<p>
Again, I don't intend to point fingers at anyone, but on the other hand, I can't suggest alternatives when a problem is presented like that.
</p>
<p>
The whole point of developing a Domain Model <em>first</em> is to find a good way to represent the business problem in a way that encourages correctness and ease of use.
</p>
<p>
If you present me with a relational model without describing the business goals you're trying to achieve, I don't have much to work with.
</p>
<p>
It may be that your business problem is truly relational, in which case an ORM probably is a good solution. I wrote as much in the previous article.
</p>
<p>
In many cases, however, it looks to me as though programmers start with a relational model, only to proceed to complain that it's difficult to work with in object-oriented (or functional) code.
</p>
<p>
If you, on the other hand, start with the business problem and figure out how to model it in code, the best way to store the data may suggest itself. Document databases are often a good fit, as are event stores. I've never had need for a graph database, but perhaps that would be a better fit for the <em>sheikh</em> domain suggested by <em>qfilip</em>.
</p>
<h3 id="8c32485e1ffd42f4ace9b83c98ae3184">
Reporting <a href="#8c32485e1ffd42f4ace9b83c98ae3184">#</a>
</h3>
<p>
While I no longer feel that relational databases are particularly well-suited for online transaction processing, they are really good at one thing: Ad-hoc querying. Because it's such a rich and mature type of technology, and because <a href="https://en.wikipedia.org/wiki/SQL">SQL</a> is a powerful language, you can slice and dice data in multiple ways.
</p>
<p>
This makes relational databases useful for reporting and other kinds of data extraction tasks.
</p>
<p>
You may have business stakeholders who insist on a relational database for that particular reason. It may even be a good reason.
</p>
<p>
If, however, the sole purpose of having a relational database is to support reporting, you may consider setting it up as a secondary system. Keep your online transactional data in another system, but regularly synchronize it to a relational database. If the only purpose of the relational database is to support reporting, you can treat it as a read-only system. This makes synchronization manageable. In general, you should avoid two-way synchronization if at all possible, but one-way synchronization is usually less of a problem.
</p>
<p>
Isn't that going to be more work, or more expensive?
</p>
<p>
That question, again, has no single answer. Of course setting up and maintaining two systems is more work at the outset. On the other hand, there's a perpetual cost to be paid if you come up with the wrong architecture. If development is slow, and you have many bugs in production, or similar problems, the cause could be that you've chosen the wrong architecture and you're now fighting a losing battle.
</p>
<p>
On the other hand, if you relegate relational databases exclusively to a reporting role, chances are that there's a lot of off-the-shelf software that can support your business users. Perhaps you can even hire a paratechnical power user to take care of that part of the system, freeing you to focus on the 'actual' system.
</p>
<p>
All of this is only meant as inspiration. If you don't want to, or can't, do it that way, then this article doesn't help you.
</p>
<h3 id="1b0ce932168349f8abb9887f9ed219c8">
Conclusion <a href="#1b0ce932168349f8abb9887f9ed219c8">#</a>
</h3>
<p>
When discussing databases, and particularly ORMs, some people approach the topic with the unspoken assumption that a relational database is the only option for storing data. Many programmers are so skilled in relational data design that they naturally use those skills when thinking new problems over.
</p>
<p>
Sometimes problems are just relational in nature, and that's fine. More often than not, however, that's not the case.
</p>
<p>
Try to model a business problem without concern for storage and see where that leads you. Test-driven development is often a great technique for such a task. Then, once you have a good API, consider how to store the data. The Domain Model that you develop in that way may naturally suggest a good way to store and retrieve the data.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="db4a9a94452a4cc7bf71989561dfd947">
<div class="comment-author"><a href="#db4a9a94452a4cc7bf71989561dfd947">qfilip</a></div>
<div class="comment-content">
<q>
<i>
Again, I don't intend to point fingers at anyone, but on the other hand, I can't suggest alternatives when a problem is presented like that.
</i>
</q>
<p>
Heh, that's fair criticism, not finger pointing. I wanted to give a better example here, but I gave up halfway through writing it. You raised some good points. I'll have to rethink my approach on domain modeling further, before asking any meaningful questions.
</p>
<p>
Years of working with EF-Core in a specific way got me... indoctrinated. Not all things are bad ofcourse, but I have missed the bigger picture in some areas, as far as I can tell.
</p>
<p>
Thanks for dedicating so many articles to the subject.
</p>
</div>
<div class="comment-date">2023-10-23 18:05 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
At the boundaries, static types are illusory
https://blog.ploeh.dk/2023/10/16/at-the-boundaries-static-types-are-illusory
2023-10-16T08:07:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Static types are useful, but have limitations.</em>
</p>
<p>
Regular readers of this blog may have noticed that I like static type systems. Not the kind of static types offered by <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>, which strikes me as mostly being able to distinguish between way too many types of integers and pointers. <a href="/2020/01/20/algebraic-data-types-arent-numbers-on-steroids">A good type system is more than just numbers on steroids</a>. A type system like C#'s is <a href="/2019/12/16/zone-of-ceremony">workable, but verbose</a>. The kind of type system I find most useful is when it has <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data types</a> and good type inference. The examples that I know best are the type systems of <a href="https://fsharp.org/">F#</a> and <a href="https://www.haskell.org/">Haskell</a>.
</p>
<p>
As great as static type systems can be, they have limitations. <a href="https://www.hillelwayne.com/post/constructive/">Hillel Wayne has already outlined one kind of distinction</a>, but here I'd like to focus on another constraint.
</p>
<h3 id="ab0d595d35304a9ea9302197b4f796d3">
Application boundaries <a href="#ab0d595d35304a9ea9302197b4f796d3">#</a>
</h3>
<p>
Any piece of software interacts with the 'rest of the world'; effectively everything outside its own process. Sometimes (but increasingly rarely) such interaction is exclusively by way of some user interface, but more and more, an application interacts with other software in some way.
</p>
<p>
<img src="/content/binary/application-boundary.png" alt="A application depicted as an opaque disk with a circle emphasising its boundary. Also included are arrows in and out, with some common communication artefacts: Messages, HTTP traffic, and a database.">
</p>
<p>
Here I've drawn the application as an opaque disc in order to emphasise that what happens inside the process isn't pertinent to the following discussion. The diagram also includes some common kinds of traffic. Many applications rely on some kind of database or send messages (email, SMS, Commands, Events, etc.). We can think of such traffic as the interactions that the application initiates, but many systems also receive and react to incoming data: HTTP traffic or messages that arrive on a queue, and so on.
</p>
<p>
When I talk about application <em>boundaries</em>, I have in mind what goes on in that interface layer.
</p>
<p>
An application can talk to the outside world in multiple ways: It may read or write a file, access shared memory, call operating-system APIs, send or receive network packets, etc. Usually you get to program against higher-level abstractions, but ultimately the application is dealing with various binary protocols.
</p>
<h3 id="4991578e222e408bb08e261dce6454f1">
Protocols <a href="#4991578e222e408bb08e261dce6454f1">#</a>
</h3>
<p>
The bottom line is that at a sufficiently low level of abstraction, what goes in and out of your application has no static type stronger than an array of bytes.
</p>
<p>
You may counter-argue that higher-level APIs deal with that to present the input and output as static types. When you interact with a text file, you'll typically deal with a list of strings: One for each line in the file. Or you may manipulate <a href="https://en.wikipedia.org/wiki/JSON">JSON</a>, <a href="https://en.wikipedia.org/wiki/XML">XML</a>, <a href="https://en.wikipedia.org/wiki/Protocol_Buffers">Protocol Buffers</a>, or another wire format using a serializer/deserializer API. Sometime, as is often the case with <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>, you may need to write a very simple parser yourself. Or perhaps <a href="/2023/08/28/a-first-crack-at-the-args-kata">something slightly more involved</a>.
</p>
<p>
To demonstrate what I mean, there's no shortage of APIs like <a href="https://learn.microsoft.com/dotnet/api/system.text.json.jsonserializer.deserialize">JsonSerializer.Deserialize</a>, which enables you to write <a href="/2022/05/02/at-the-boundaries-applications-arent-functional">code like this</a>:
</p>
<p>
<pre><span style="color:blue;">let</span> n = JsonSerializer.Deserialize<Name> (json, opts)</pre>
</p>
<p>
and you may say: <em><code>n</code> is statically typed, and its type is <code>Name</code>! Hooray!</em> But you do realise that that's only half a truth, don't you?
</p>
<p>
An interaction at the application boundary is expected to follow some kind of <em>protocol</em>. This is even true if you're reading a text file. In these modern times, you may expect a text file to contain <a href="https://unicode.org/">Unicode</a>, but have you ever received a file from a legacy system and have to deal with its <a href="https://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a> encoding? Or an <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> file with a <a href="https://en.wikipedia.org/wiki/Code_page">code page</a> different from the one you expect? Or even just a file written on a Unix system, if you're on Windows, or vice versa?
</p>
<p>
In order to correctly interpret or transmit such data, you need to follow a <em>protocol</em>.
</p>
<p>
Such a protocol can be low-level, as the character-encoding examples I just listed, but it may also be much more high-level. You may, for example, consider an HTTP request like this:
</p>
<p>
<pre>POST /restaurants/90125/reservations?sig=aco7VV%2Bh5sA3RBtrN8zI8Y9kLKGC60Gm3SioZGosXVE%3D HTTP/1.1
Content-Type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2021-12-08 20:30"</span>,
<span style="color:#2e75b6;">"email"</span>: <span style="color:#a31515;">"snomob@example.com"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Snow Moe Beal"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 1
}</pre>
</p>
<p>
Such an interaction implies a protocol. Part of such a protocol is that the HTTP request's body is a valid JSON document, that it has an <code>at</code> property, that that property encodes a valid date and time, that <code>quantity</code> is a natural number, that <code>email</code> <a href="/2023/07/03/validating-or-verifying-emails">is present</a>, and so on.
</p>
<p>
You can model the expected input as a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a> (DTO):
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReservationDto</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? At { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Email { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Name { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Quantity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
and even set up your 'protocol handlers' (here, an ASP.NET Core <a href="https://learn.microsoft.com/aspnet/core/mvc/controllers/actions">action method</a>) to use such a DTO:
</p>
<p>
<pre><span style="color:blue;">public</span> Task<ActionResult> <span style="font-weight:bold;color:#74531f;">Post</span>(ReservationDto <span style="font-weight:bold;color:#1f377f;">dto</span>)</pre>
</p>
<p>
While this may look statically typed, it assumes a particular protocol. What happens when the bytes on the wire don't follow the protocol?
</p>
<p>
Well, we've already been <a href="/2022/08/15/aspnet-validation-revisited">around that block</a> <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">more than once</a>.
</p>
<p>
The point is that there's always an implied protocol at the application boundary, and you can choose to model it more or less explicitly.
</p>
<h3 id="41f3b4ad7a4b4429bba3f619c2af55d1">
Types as short-hands for protocols <a href="#41f3b4ad7a4b4429bba3f619c2af55d1">#</a>
</h3>
<p>
In the above example, I've relied on <em>some</em> static typing to deal with the problem. After all, I did define a DTO to model the expected shape of input. I could have chosen other alternatives: Perhaps I could have used a JSON parser to explicitly <a href="https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/use-dom">use the JSON DOM</a>, or even more low-level <a href="https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/use-utf8jsonreader">used Utf8JsonReader</a>. Ultimately, I could have decided to write my own JSON parser.
</p>
<p>
I'd rarely (or never?) choose to implement a JSON parser from scratch, so that's not what I'm advocating. Rather, my point is that you can leverage existing APIs to deal with input and output, and some of those APIs offer a convincing illusion that what happens at the boundary is statically typed.
</p>
<p>
This illusion is partly API-specific, and partly language-specific. In .NET, for example, <code>JsonSerializer.Deserialize</code> <em>looks</em> like it'll always deserialize <em>any</em> JSON string into the desired model. Obviously, that's a lie, because the function will throw an exception if the operation is impossible (i.e. when the input is malformed). In .NET (and many other languages or platforms), you can't tell from an API's type what the failure modes might be. In contrast, aeson's <a href="https://hackage.haskell.org/package/aeson/docs/Data-Aeson.html#v:fromJSON">fromJSON</a> function returns a type that explicitly indicates that deserialization may fail. Even in Haskell, however, this is mostly an <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> convention, because Haskell also 'supports' exceptions.
</p>
<p>
At the boundary, a static type can be a useful shorthand for a protocol. You declare a static type (e.g. a DTO) and rely on built-in machinery to handle malformed input. You give up some fine-grained control in exchange for a more declarative model.
</p>
<p>
I often choose to do that because I find such a trade-off beneficial, but I'm under no illusion that my static types fully model what goes 'on the wire'.
</p>
<h3 id="792212e79feb46889e71a6c08dedb88e">
Reversed roles <a href="#792212e79feb46889e71a6c08dedb88e">#</a>
</h3>
<p>
So far, I've mostly discussed input validation. <a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a> No, but they can make most common validation scenarios easier. What happens when you return data?
</p>
<p>
You may decide to return a statically typed value. A serializer can faithfully convert such a value to a proper wire format (JSON, XML, or similar). The recipient may not care about that type. After all, you may return a Haskell value, but the system receiving the data is written in <a href="https://www.python.org/">Python</a>. Or you return a C# object, but the recipient is <a href="https://en.wikipedia.org/wiki/JavaScript">JavaScript</a>.
</p>
<p>
Should we conclude, then, that there's no reason to model return data with static types? Not at all, because by modelling output with static types, you are being <a href="https://en.wikipedia.org/wiki/Robustness_principle">conservative with what you send</a>. Since static types are typically more rigid than 'just code', there may be corner cases that a type can't easily express. While this may pose a problem when it comes to input, it's only a benefit when it comes to output. This means that you're <a href="/2021/11/29/postels-law-as-a-profunctor">narrowing the output funnel</a> and thus making your system easier to work with.
</p>
<p>
<img src="/content/binary/liberal-conservative-at-boundary.png" alt="Funnels labelled 'liberal' and 'conservative' to the left of an line indicating an application boundary.">
</p>
<p>
Now consider another role-reversal: When your application <em>initiates</em> an interaction, it starts by producing output and receives input as a result. This includes any database interaction. When you create, update, or delete a row in a database, you <em>send</em> data, and receive a response.
</p>
<p>
Should you not consider <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a> in that case?
</p>
<p>
<img src="/content/binary/conservative-liberal-at-boundary.png" alt="Funnels labelled 'conservative' and 'liberal' to the right of an line indicating an application boundary.">
</p>
<p>
Most people don't, particularly if they rely on <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a> (ORMs). After all, if you have a static type (class) that models a database row, what's the harm using that when updating the database?
</p>
<p>
Probably none. After all, based on what I've just written, using a static type is a good way to be conservative with what you send. Here's an example using <a href="https://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a>:
</p>
<p>
<pre><span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> RestaurantsContext(ConnectionString);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dbReservation</span> = <span style="color:blue;">new</span> Reservation
{
PublicId = reservation.Id,
RestaurantId = restaurantId,
At = reservation.At,
Name = reservation.Name.ToString(),
Email = reservation.Email.ToString(),
Quantity = reservation.Quantity
};
<span style="color:blue;">await</span> db.Reservations.AddAsync(dbReservation);
<span style="color:blue;">await</span> db.SaveChangesAsync();</pre>
</p>
<p>
Here we send a statically typed <code>Reservation</code> 'Entity' to the database, and since we use a static type, we're being conservative with what we send. That's only good.
</p>
<p>
What happens when we query a database? Here's a typical example:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<Restaurants.Reservation?> <span style="font-weight:bold;color:#74531f;">ReadReservation</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, Guid <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> RestaurantsContext(ConnectionString);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:blue;">await</span> db.Reservations.FirstOrDefaultAsync(<span style="font-weight:bold;color:#1f377f;">x</span> => x.PublicId == id);
<span style="font-weight:bold;color:#8f08c4;">if</span> (r <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Restaurants.Reservation(
r.PublicId,
r.At,
<span style="color:blue;">new</span> Email(r.Email),
<span style="color:blue;">new</span> Name(r.Name),
r.Quantity);
}</pre>
</p>
<p>
Here I read a database row <code>r</code> and unquestioning translate it to my domain model. Should I do that? What if the database schema has diverged from my application code?
</p>
<p>
I suspect that much grief and trouble with relational databases, and particularly with ORMs, stem from the illusion that an ORM 'Entity' is a statically-typed view of the database schema. Typically, you can either use an ORM like Entity Framework in a code-first or a database-first fashion, but regardless of what you choose, you have two competing 'truths' about the database: The database schema and the Entity Classes.
</p>
<p>
You need to be disciplined to keep those two views in synch, and I'm not asserting that it's impossible. I'm only suggesting that it may pay to explicitly acknowledge that static types may not represent any truth about what's actually on the other side of the application boundary.
</p>
<h3 id="1ab46f8e48a74b94ad9aa92cce2d915f">
Types are an illusion <a href="#1ab46f8e48a74b94ad9aa92cce2d915f">#</a>
</h3>
<p>
Given that I usually find myself firmly in the static-types-are-great camp, it may seem odd that I now spend an entire article trashing them. Perhaps it looks as though I've had a revelation and made an about-face, but that's not the case. Rather, I'm fond of making the implicit explicit. This often helps improve understanding, because it helps delineate conceptual boundaries.
</p>
<p>
This, too, is the case here. <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">All models are wrong, but some models are useful</a>. So are static types, I believe.
</p>
<p>
A static type system is a useful tool that enables you to model how your application should behave. The types don't really exist at run time. Even though .NET code (just to point out an example) compiles to <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">a binary representation that includes type information</a>, once it runs, it <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation">JITs</a> to machine code. In the end, it's just registers and memory addresses, or, if you want to be even more nihilistic, electrons moving around on a circuit board.
</p>
<p>
Even at a higher level of abstraction, you may say: <em>But at least, a static type system can help you encapsulate rules and assumptions.</em> In a language like C#, for example, consider a <a href="https://www.hillelwayne.com/post/constructive/">predicative type</a> like <a href="/2022/08/22/can-types-replace-validation">this NaturalNumber</a> class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">struct</span> <span style="color:#2b91af;">NaturalNumber</span> : IEquatable<NaturalNumber>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> i;
<span style="color:blue;">public</span> <span style="color:#2b91af;">NaturalNumber</span>(<span style="color:blue;">int</span> candidate)
{
<span style="color:blue;">if</span> (candidate < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(candidate),
<span style="color:#a31515;">$"The value must be a positive (non-zero) number, but was: </span>{candidate}<span style="color:#a31515;">."</span>);
<span style="color:blue;">this</span>.i = candidate;
}
<span style="color:green;">// Various other members follow...</span></pre>
</p>
<p>
Such a class effectively protects the invariant that a <a href="https://en.wikipedia.org/wiki/Natural_number">natural number</a> is always a positive integer. Yes, that works well until someone does this:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">n</span> = (NaturalNumber)FormatterServices.GetUninitializedObject(<span style="color:blue;">typeof</span>(NaturalNumber));</pre>
</p>
<p>
This <code>n</code> value has the internal value <code>0</code>. Yes, <a href="https://learn.microsoft.com/dotnet/api/system.runtime.serialization.formatterservices.getuninitializedobject">FormatterServices.GetUninitializedObject</a> bypasses the constructor. This thing is evil, but it exists, and at least in the current discussion serves to illustrate the point that types are illusions.
</p>
<p>
This isn't just a flaw in C#. Other languages have similar backdoors. One of the most famously statically-typed languages, Haskell, comes with <a href="https://hackage.haskell.org/package/base/docs/System-IO-Unsafe.html#v:unsafePerformIO">unsafePerformIO</a>, which enables you to pretend that nothing untoward is going on even if you've written some impure code.
</p>
<p>
You may (and should) institute policies to not use such backdoors in your normal code bases. You don't need them.
</p>
<h3 id="de28c90a44e14b299f6eb30c09b08821">
Types are useful models <a href="#de28c90a44e14b299f6eb30c09b08821">#</a>
</h3>
<p>
All this may seem like an argument that types are useless. That would, however, be to draw the wrong conclusion. Types don't exist at run time to the same degree that Python objects or JavaScript functions don't exist at run time. Any language (except <a href="https://en.wikipedia.org/wiki/Assembly_language">assembler</a>) is an abstraction: A way to model computer instructions so that programming becomes easier (one would hope, <a href="/2023/09/11/a-first-stab-at-the-brainfuck-kata">but then...</a>). This is true even for C, as low-level and detail-oriented as it may seem.
</p>
<p>
If you grant that high-level programming languages (i.e. any language that is <em>not</em> machine code or assembler) are useful, you must also grant that you can't rule out the usefulness of types. Notice that this argument is one of logic, rather than of preference. The only claim I make here is that programming is based on useful illusions. That the abstractions are illusions don't prevent them from being useful.
</p>
<p>
In statically typed languages, we effectively need to pretend that the type system is good enough, strong enough, generally trustworthy enough that it's safe to ignore the underlying reality. We work with, if you will, a provisional truth that serves as a user interface to the computer.
</p>
<p>
Even though a computer program eventually executes on a processor where types don't exist, a good compiler can still check that our models look sensible. We say that it <em>type-checks</em>. I find that indispensable when modelling the internal behaviour of a program. Even in a large code base, a compiler can type-check whether all the various components look like they may compose correctly. That a program compiles is no guarantee that it works correctly, but if it doesn't type-check, it's strong evidence that the code's model is <em>internally</em> inconsistent.
</p>
<p>
In other words, that a statically-typed program type-checks is a necessary, but not a sufficient condition for it to work.
</p>
<p>
This holds as long as we're considering program internals. Some language platforms allow us to take this notion further, because we can link software components together and still type-check them. The .NET platform is a good example of this, since the IL code retains type information. This means that the C#, F#, or <a href="https://en.wikipedia.org/wiki/Visual_Basic_(.NET)">Visual Basic .NET</a> compiler can type-check your code against the APIs exposed by external libraries.
</p>
<p>
On the other hand, you can't extend that line of reasoning to the boundary of an application. What happens at the boundary is ultimately untyped.
</p>
<p>
Are types useless at the boundary, then? Not at all. <a href="https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-type-systems-are-not-inherently-more-open/">Alexis King has already dealt with this topic better than I could</a>, but the point is that types remain an effective way to capture the result of <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">parsing input</a>. You can view receiving, handling, parsing, or validating input as implementing a protocol, as I've already discussed above. Such protocols are application-specific or domain-specific rather than general-purpose protocols, but they are still protocols.
</p>
<p>
When I decide to write <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">input validation for my restaurant sample code base as a set of composable parsers</a>, I'm implementing a protocol. My starting point isn't raw bits, but rather a loose static type: A DTO. In other cases, I may decide to use a different level of abstraction.
</p>
<p>
One of the (many) reasons I have for <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">finding ORMs unhelpful</a> is exactly because they insist on an illusion past its usefulness. Rather, I prefer implementing the protocol that talks to my database with a lower-level API, such as ADO.NET:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Reservation <span style="color:#74531f;">ReadReservationRow</span>(SqlDataReader <span style="font-weight:bold;color:#1f377f;">rdr</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Reservation(
(Guid)rdr[<span style="color:#a31515;">"PublicId"</span>],
(DateTime)rdr[<span style="color:#a31515;">"At"</span>],
<span style="color:blue;">new</span> Email((<span style="color:blue;">string</span>)rdr[<span style="color:#a31515;">"Email"</span>]),
<span style="color:blue;">new</span> Name((<span style="color:blue;">string</span>)rdr[<span style="color:#a31515;">"Name"</span>]),
<span style="color:blue;">new</span> NaturalNumber((<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"Quantity"</span>]));
}</pre>
</p>
<p>
This actually isn't a particular good protocol implementation, because it fails to take Postel's law into account. Really, this code should be a <a href="https://martinfowler.com/bliki/TolerantReader.html">Tolerant Reader</a>. In practice, not that much input contravariance is possible, but perhaps, at least, this code ought to gracefully handle if the <code>Name</code> field was missing.
</p>
<p>
The point of this particular example isn't that it's perfect, because it's not, but rather that it's possible to drop down to a lower level of abstraction, and sometimes, this may be a more honest representation of reality.
</p>
<h3 id="ce2a1d57f63e4f39a28e801fd23164cf">
Conclusion <a href="#ce2a1d57f63e4f39a28e801fd23164cf">#</a>
</h3>
<p>
It may be helpful to acknowledge that static types don't really exist. Even so, internally in a code base, a static type system can be a powerful tool. A good type system enables a compiler to check whether various parts of your code looks internally consistent. Are you calling a procedure with the correct arguments? Have you implemented all methods defined by an interface? Have you handled all cases defined by a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a>? Have you correctly initialized an object?
</p>
<p>
As useful type systems are for this kind of work, you should also be aware of their limitations. A compiler can check whether a code base's internal model makes sense, but it can't verify what happens at run time.
</p>
<p>
As long as one part of your code base sends data to another part of your code base, your type system can still perform a helpful sanity check, but for data that enters (or leaves) your application at run time, bets are off. You may attempt to model what input <em>should</em> look like, and it may even be useful to do that, but it's important to acknowledge that reality may not look like your model.
</p>
<p>
You can write statically-typed, composable parsers. Some of them are quite elegant, but the good ones explicitly model that parsing of input is error-prone. When input is well-formed, the result may be a nicely <a href="/2022/10/24/encapsulation-in-functional-programming">encapsulated</a>, statically-typed value, but when it's malformed, the result is one or more error values.
</p>
<p>
Perhaps the most important message is that databases, other web services, file systems, etc. involve input and output, too. Even if <em>you</em> write code that initiates a database query, or a web service request, should you implicitly trust the data that comes back?
</p>
<p>
This question of trust doesn't have to imply security concerns. Rather, systems evolve and errors happen. Every time you interact with an external system, there's a risk that it has become misaligned with yours. Static types can't protect you against that.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
What's a sandwich?
https://blog.ploeh.dk/2023/10/09/whats-a-sandwich
2023-10-09T20:20:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Ultimately, it's more about programming than food.</em>
</p>
<p>
The <a href="https://en.wikipedia.org/wiki/Sandwich">Sandwich</a> was named after <a href="https://en.wikipedia.org/wiki/John_Montagu,_4th_Earl_of_Sandwich">John Montagu, 4th Earl of Sandwich</a> because of his fondness for this kind of food. As popular story has it, he found it practical because it enabled him to eat without greasing the cards he often played.
</p>
<p>
A few years ago, a corner of the internet erupted in good-natured discussion about exactly what constitutes a sandwich. For instance, is the Danish <a href="https://en.wikipedia.org/wiki/Sm%C3%B8rrebr%C3%B8d">smørrebrød</a> a sandwich? It comes in two incarnations: <em>Højtbelagt</em>, the luxury version which is only consumable with knife and fork and the more modest, everyday <em>håndmad</em> (literally <em>hand food</em>), which, while open-faced, can usually be consumed without cutlery.
</p>
<p>
<img src="/content/binary/bjoernekaelderen-hoejtbelagt.jpg" alt="A picture of elaborate Danish smørrebrød.">
</p>
<p>
If we consider the 4th Earl of Sandwich's motivation as a yardstick, then the depicted <em>højtbelagte smørrebrød</em> is hardly a sandwich, while I believe a case can be made that a <em>håndmad</em> is:
</p>
<p>
<img src="/content/binary/haandmadder.jpg" alt="Two håndmadder a half of a sliced apple.">
</p>
<p>
Obviously, you need a different grip on a <em>håndmad</em> than on a sandwich. The bread (<em>rugbrød</em>) is much denser than wheat bread, and structurally more rigid. You eat it with your thumb and index finger on each side, and remaining fingers supporting it from below. The bottom line is this: A single piece of bread with something on top can also solve the original problem.
</p>
<p>
What if we go in the other direction? How about a combo consisting of bread, meat, bread, meat, and bread? I believe that I've seen burgers like that. Can you eat that with one hand? I think that this depends more on how greasy and overfilled it is, than on the structure.
</p>
<p>
What if you had five layers of meat and six layers of bread? This is unlikely to work with traditional Western leavened bread which, being a foam, will lose structural integrity when cut too thin. Imagining other kinds of bread, though, and thin slices of meat (or other 'content'), I don't see why it couldn't work.
</p>
<h3 id="00d495b0703a45a98f36607e99799c62">
FP sandwiches <a href="#00d495b0703a45a98f36607e99799c62">#</a>
</h3>
<p>
As regular readers may have picked up over the years, I do like food, but this is, after all, a programming blog.
</p>
<p>
A few years ago I presented a functional-programming design pattern named <a href="/2020/03/02/impureim-sandwich">Impureim sandwich</a>. It argues that it's often beneficial to structure a code base according to the <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">functional core, imperative shell</a> architecture.
</p>
<p>
The idea, in a nutshell, is that at every entry point (<code>Main</code> method, message handler, Controller action, etcetera) you first perform all impure actions necessary to collect input data for a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>, then you call that pure function (which may be composed by many smaller functions), and finally you perform one or more impure actions based on the function's return value. That's the <a href="/2020/03/02/impureim-sandwich">impure-pure-impure sandwich</a>.
</p>
<p>
My experience with this pattern is that it's surprisingly often possible to apply it. Not always, but more often than you think.
</p>
<p>
Sometimes, however, it demands a looser interpretation of the word <em>sandwich</em>.
</p>
<p>
Even the examples from <a href="/2020/03/02/impureim-sandwich">the article</a> aren't standard sandwiches, once you dissect them. Consider, first, the <a href="https://www.haskell.org/">Haskell</a> example, here recoloured:
</p>
<p>
<pre><span style="color:#600277;">tryAcceptComposition</span> :: <span style="color:blue;">Reservation</span> <span style="color:blue;">-></span> IO (Maybe Int)
tryAcceptComposition reservation <span style="color:#666666;">=</span> runMaybeT <span style="color:#666666;">$</span>
<span style="background-color: lightsalmon;"> liftIO (<span style="color:#dd0000;">DB</span><span style="color:#666666;">.</span>readReservations connectionString</span><span style="background-color: palegreen;"> <span style="color:#666666;">$</span> date reservation</span><span style="background-color: lightsalmon;">)</span>
<span style="background-color: palegreen;"> <span style="color:#666666;">>>=</span> <span style="color:#dd0000;">MaybeT</span> <span style="color:#666666;">.</span> return <span style="color:#666666;">.</span> flip (tryAccept <span style="color:#09885a;">10</span>) reservation</span>
<span style="background-color: lightsalmon;"> <span style="color:#666666;">>>=</span> liftIO <span style="color:#666666;">.</span> <span style="color:#dd0000;">DB</span><span style="color:#666666;">.</span>createReservation connectionString</span></pre>
</p>
<p>
The <code>date</code> function is a pure accessor that retrieves the date and time of the <code>reservation</code>. In C#, it's typically a read-only property:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> Post(<span style="color:#2b91af;">Reservation</span> reservation)
{
<span style="background-color: lightsalmon;"> <span style="color:blue;">return</span> <span style="color:blue;">await</span> Repository.ReadReservations(</span><span style="background-color: palegreen;">reservation.Date</span><span style="background-color: lightsalmon;">)</span>
<span style="background-color: palegreen;"> .Select(rs => maîtreD.TryAccept(rs, reservation))</span>
<span style="background-color: lightsalmon;"> .SelectMany(m => m.Traverse(Repository.Create))
.Match(InternalServerError(<span style="color:#a31515;">"Table unavailable"</span>), Ok);</span>
}</pre>
</p>
<p>
Perhaps you don't think of a C# property as a function. After all, it's just an idiomatic grouping of language keywords:
</p>
<p>
<pre><span style="color:blue;">public</span> DateTimeOffset Date { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
Besides, a function takes input and returns output. What's the input in this case?
</p>
<p>
Keep in mind that a C# read-only property like this is only syntactic sugar for a getter method. In Java it would have been a method called <code>getDate()</code>. From <a href="/2018/01/22/function-isomorphisms">Function isomorphisms</a> we know that an instance method is isomorphic to a function that takes the object as input:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> DateTimeOffset GetDate(Reservation reservation)</pre>
</p>
<p>
In other words, the <code>Date</code> property is an operation that takes the object itself as input and returns <code>DateTimeOffset</code> as output. The operation has no side effects, and will always return the same output for the same input. In other words, it's a pure function, and that's the reason I've now coloured it green in the above code examples.
</p>
<p>
The layering indicated by the examples may, however, be deceiving. The green colour of <code>reservation.Date</code> is adjacent to the green colour of the <code>Select</code> expression below it. You might interpret this as though the pure middle part of the sandwich partially expands to the upper impure phase.
</p>
<p>
That's not the case. The <code>reservation.Date</code> expression executes <em>before</em> <code>Repository.ReadReservations</code>, and only then does the pure <code>Select</code> expression execute. Perhaps this, then, is a more honest depiction of the sandwich:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<IActionResult> Post(Reservation reservation)
{
<span style="background-color: palegreen;"> <span style="color:blue;">var</span> date = reservation.Date;</span>
<span style="background-color: lightsalmon;"> <span style="color:blue;">return</span> <span style="color:blue;">await</span> Repository.ReadReservations(date)</span>
<span style="background-color: palegreen;"> .Select(rs => maîtreD.TryAccept(rs, reservation))</span>
<span style="background-color: lightsalmon;"> .SelectMany(m => m.Traverse(Repository.Create))
.Match(InternalServerError(<span style="color:#a31515;">"Table unavailable"</span>), Ok);</span>
}</pre>
</p>
<p>
The corresponding 'sandwich diagram' looks like this:
</p>
<p>
<img src="/content/binary/pure-impure-pure-impure-box.png" alt="A box with green, red, green, and red horizontal tiers.">
</p>
<p>
If you want to interpret the word <em>sandwich</em> narrowly, this is no longer a sandwich, since there's 'content' on top. That's the reason I started this article discussing Danish <em>smørrebrød</em>, also sometimes called <em>open-faced sandwiches</em>. Granted, I've never seen a <em>håndmad</em> with two slices of bread with meat both between and on top. On the other hand, I don't think that having a smidgen of 'content' on top is a showstopper.
</p>
<h3 id="c3a4d1243ee540af95571141c0dd500e">
Initial and eventual purity <a href="#c3a4d1243ee540af95571141c0dd500e">#</a>
</h3>
<p>
Why is this important? Whether or not <code>reservation.Date</code> is a little light of purity in the otherwise impure first slice of the sandwich actually doesn't concern me that much. After all, my concern is mostly cognitive load, and there's hardly much gained by extracting the <code>reservation.Date</code> expression to a separate line, as I did above.
</p>
<p>
The reason this interests me is that in many cases, the first step you may take is to validate input, and <a href="/2023/06/26/validation-and-business-rules">validation is a composed set of pure functions</a>. While pure, and <a href="/2020/12/14/validation-a-solved-problem">a solved problem</a>, validation may be a sufficiently significant step that it warrants explicit acknowledgement. It's not just a property getter, but complex enough that bugs could hide there.
</p>
<p>
Even if you follow the <em>functional core, imperative shell</em> architecture, you'll often find that the first step is pure validation.
</p>
<p>
Likewise, once you've performed impure actions in the second impure phase, you can easily have a final thin pure translation slice. In fact, the above C# example contains an example of just that:
</p>
<p>
<pre><span style="color:blue;">public</span> IActionResult Ok(<span style="color:blue;">int</span> value)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> OkActionResult(value);
}
<span style="color:blue;">public</span> IActionResult InternalServerError(<span style="color:blue;">string</span> msg)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> InternalServerErrorActionResult(msg);
}</pre>
</p>
<p>
These are two tiny pure functions used as the final translation in the sandwich:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<IActionResult> Post(Reservation reservation)
{
<span style="background-color: palegreen;"> <span style="color:blue;">var</span> date = reservation.Date;</span>
<span style="background-color: lightsalmon;"> <span style="color:blue;">return</span> <span style="color:blue;">await</span> Repository.ReadReservations(date)</span>
<span style="background-color: palegreen;"> .Select(rs => maîtreD.TryAccept(rs, reservation))</span>
<span style="background-color: lightsalmon;"> .SelectMany(m => m.Traverse(Repository.Create))
.Match(</span><span style="background-color: palegreen;">InternalServerError(<span style="color:#a31515;">"Table unavailable"</span>), Ok</span><span style="background-color: lightsalmon;">);</span></span>
}</pre>
</p>
<p>
On the other hand, I didn't want to paint the <code>Match</code> operation green, since it's essentially a continuation of a <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task-1">Task</a>, and if we consider <a href="/2020/07/27/task-asynchronous-programming-as-an-io-surrogate">task asynchronous programming as an IO surrogate</a>, we should, at least, regard it with scepticism. While it might be pure, it probably isn't.
</p>
<p>
Still, we may be left with an inverted 'sandwich' that looks like this:
</p>
<p>
<img src="/content/binary/pure-impure-pure-impure-pure-box.png" alt="A box with green, red, green, red, and green horizontal tiers.">
</p>
<p>
Can we still claim that this is a sandwich?
</p>
<h3 id="4d14e6795066473e95d1e5cdbcef6c2d">
At the metaphor's limits <a href="#4d14e6795066473e95d1e5cdbcef6c2d">#</a>
</h3>
<p>
This latest development seems to strain the sandwich metaphor. Can we maintain it, or does it fall apart?
</p>
<p>
What seems clear to me, at least, is that this ought to be the limit of how much we can stretch the allegory. If we add more tiers we get a <a href="https://en.wikipedia.org/wiki/Dagwood_sandwich">Dagwood sandwich</a> which is clearly a gimmick of little practicality.
</p>
<p>
But again, I'm appealing to a dubious metaphor, so instead, let's analyse what's going on.
</p>
<p>
In practice, it seems that you can rarely avoid the initial (pure) validation step. Why not? Couldn't you move validation to the functional core and do the impure steps without validation?
</p>
<p>
The short answer is <em>no</em>, because <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">validation done right is actually parsing</a>. At the entry point, you don't even know if the input makes sense.
</p>
<p>
A more realistic example is warranted, so I now turn to the example code base from my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>. One blog post shows <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">how to implement applicative validation for posting a reservation</a>.
</p>
<p>
A typical HTTP <code>POST</code> may include a JSON document like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"id"</span>: <span style="color:#a31515;">"bf4e84130dac451b9c94049da8ea8c17"</span>,
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2024-11-07T20:30"</span>,
<span style="color:#2e75b6;">"email"</span>: <span style="color:#a31515;">"snomob@example.com"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Snow Moe Beal"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 1
}</pre>
</p>
<p>
In order to handle even such a simple request, the system has to perform a set of impure actions. One of them is to query its data store for existing reservations. After all, the restaurant may not have any remaining tables for that day.
</p>
<p>
Which day, you ask? I'm glad you asked. The data access API comes with this method:
</p>
<p>
<pre>Task<IReadOnlyCollection<Reservation>> ReadReservations(
<span style="color:blue;">int</span> restaurantId, DateTime min, DateTime max);</pre>
</p>
<p>
You can supply <code>min</code> and <code>max</code> values to indicate the range of dates you need. How do you determine that range? You need the desired date of the reservation. In the above example it's 20:30 on November 7 2024. We're in luck, the data is there, and understandable.
</p>
<p>
Notice, however, that due to limitations of wire formats such as JSON, the date is a string. The value might be anything. If it's sufficiently malformed, you can't even perform the impure action of querying the database, because you don't know what to query it about.
</p>
<p>
If keeping the sandwich metaphor untarnished, you might decide to push the parsing responsibility to an impure action, but why make something impure that has a well-known pure solution?
</p>
<p>
A similar argument applies when performing a final, pure translation step in the other direction.
</p>
<p>
So it seems that we're stuck with implementations that don't quite fit the ideal of the sandwich metaphor. Is that enough to abandon the metaphor, or should we keep it?
</p>
<p>
The layers in layered application architecture aren't really layers, and neither are vertical slices really slices. <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">All models are wrong, but some are useful</a>. This is the case here, I believe. You should still keep the <a href="/2020/03/02/impureim-sandwich">Impureim sandwich</a> in mind when structuring code: Keep impure actions at the application boundary - in the 'Controllers', if you will; have only two phases of impurity - the initial and the ultimate; and maximise use of pure functions for everything else. Keep most of the pure execution between the two impure phases, but realistically, you're going to need a pure validation phase in front, and a slim translation layer at the end.
</p>
<h3 id="5b191dfc434149bab1d9d6bea029a4d4">
Conclusion <a href="#5b191dfc434149bab1d9d6bea029a4d4">#</a>
</h3>
<p>
Despite the prevalence of food imagery, this article about functional programming architecture has eluded any mention of <a href="https://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/">burritos</a>. Instead, it examines the tension between an ideal, the <a href="/2020/03/02/impureim-sandwich">Impureim sandwich</a>, with real-world implementation details. When you have to deal with concerns such as input validation or translation to egress data, it's practical to add one or two more thin slices of purity.
</p>
<p>
In <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a> you want to maximise the proportion of pure functions. Adding more pure code is hardly a problem.
</p>
<p>
The opposite is not the case. We shouldn't be cavalier about adding more impure slices to the sandwich. Thus, the adjusted definition of the Impureim sandwich seems to be that it may have at most two impure phases, but from one to three pure slices.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="aa4031dbe9a7467ba087c2731596f420">
<div class="comment-author">qfilip <a href="#aa4031dbe9a7467ba087c2731596f420">#</a></div>
<div class="comment-content">
<p>
Hello again...
</p>
<p>
In one of your excellent talks (<a href="https://youtu.be/F9bznonKc64?feature=shared&t=3392">here</a>), you ended up refactoring maitreD kata using the <pre>traverse</pre> function. Since this step is crucial for "sandwich" to work, any post detailing it's implementation would be nice.
</p>
<p>
Thanks
</p>
</div>
<div class="comment-date">2023-11-16 10:56 UTC</div>
</div>
<div class="comment" id="7ea7f0f5f3a24a939be3a1cb5b23e2f5">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#7ea7f0f5f3a24a939be3a1cb5b23e2f5">#</a></div>
<div class="comment-content">
<p>
qfilip, thank you for writing. That particular talk fortunately comes with a set of companion articles:
</p>
<ul>
<li><a href="/2019/02/04/how-to-get-the-value-out-of-the-monad">How to get the value out of the monad</a></li>
<li><a href="/2019/02/11/asynchronous-injection">Asynchronous Injection</a></li>
</ul>
<p>
The latter of the two comes with a link to <a href="https://github.com/ploeh/asynchronous-injection">a GitHub repository with all the sample code</a>, including the <code>Traverse</code> implementation.
</p>
<p>
That said, a more formal description of traversals has long been on my to-do list, as you can infer from <a href="/2022/07/11/functor-relationships">this (currently inactive) table of contents</a>.
</p>
</div>
<div class="comment-date">2023-11-16 11:18 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Dependency Whac-A-Mole
https://blog.ploeh.dk/2023/10/02/dependency-whac-a-mole
2023-10-02T07:52:00+00:00
Mark Seemann
<div id="post">
<p>
<em>AKA Framework Whac-A-Mole, Library Whac-A-Mole.</em>
</p>
<p>
I have now three times used the name <a href="https://en.wikipedia.org/wiki/Whac-A-Mole">Whac-A-Mole</a> about a particular kind of relationship that may evolve with some dependencies. According to the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">rule of three</a>, I can now extract the explanation to a separate article. This is that article.
</p>
<h3 id="f9a98473c3ed40eda1f6288eec631795">
Architecture smell <a href="#f9a98473c3ed40eda1f6288eec631795">#</a>
</h3>
<p>
<em>Dependency Whac-A-Mole</em> describes the situation when you're spending too much time investigating, learning, troubleshooting, and overall satisfying the needs of a dependency (i.e. library or framework) instead of delivering value to users.
</p>
<p>
Examples include Dependency Injection containers, <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a>, validation frameworks, dynamic mock libraries, and perhaps the Gherkin language.
</p>
<p>
From the above list it does <em>not</em> follow that those examples are universally bad. I can think of situations where some of them make sense. I might even use them myself.
</p>
<p>
Rather, the Dependency Whac-A-Mole architecture smell occurs when a given dependency causes more trouble than the benefit it was supposed to provide.
</p>
<h3 id="9ae83d04788d4d4c9582ba02aa11b19b">
Causes <a href="#9ae83d04788d4d4c9582ba02aa11b19b">#</a>
</h3>
<p>
We rarely set out to do the wrong thing, but we often make mistakes in good faith. You may decide to take a dependency on a library or framework because
</p>
<ul>
<li>it worked well for you in a previous context</li>
<li>it looks as though it'll address a major problem you had in a previous context</li>
<li>you've heard good things about it</li>
<li>you saw a convincing demo</li>
<li>you heard about it in a podcast, conference talk, YouTube video, etc.</li>
<li>a FAANG company uses it</li>
<li>it's the latest tech</li>
<li>you want it on your CV</li>
</ul>
<p>
There could be other motivations as well, and granted, some of those I listed aren't really <em>good</em> reasons. Even so, I don't think anyone chooses a dependency with ill intent.
</p>
<p>
And what might work in one context may turn out to not work in another. You can't always predict such consequences, so I imply no judgement on those who choose the 'wrong' dependency. I've done it, too.
</p>
<p>
It is, however, important to be aware that this risk is always there. You picked a library with the best of intentions, but it turns out to slow you down. If so, acknowledge the mistake and kill your darlings.
</p>
<h3 id="02aa21e2bdc645f1b769c5a8412323f9">
Background <a href="#02aa21e2bdc645f1b769c5a8412323f9">#</a>
</h3>
<p>
Whenever you use a library or framework, you need to learn how to use it effectively. You have to learn its concepts, abstractions, APIs, pitfalls, etc. Not only that, but you need to stay abreast of changes and improvements.
</p>
<p>
Microsoft, for example, is usually good at maintaining backwards compatibility, but even so, things don't stand still. They evolve libraries and frameworks the same way I would do it: Don't introduce breaking changes, but do introduce new, better APIs going forward. This is essentially the <a href=https://martinfowler.com/bliki/StranglerFigApplication.html>Strangler pattern</a> that I also write about in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
While it's a good way to evolve a library or framework, the point remains: Even if you trust a supplier to prioritise backwards compatibility, it doesn't mean that you can stop learning. You have to stay up to date with all your dependencies. If you don't, sooner or later, the way that you use something like, say, <a href="https://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a> is 'the old way', and it's not really supported any longer.
</p>
<p>
In order to be able to move forward, you'll have to rewrite those parts of your code that depend on that old way of doing things.
</p>
<p>
Each dependency comes with benefits and costs. As long as the benefits outweigh the costs, it makes sense to keep it around. If, on the other hand, you spend more time dealing with it than it would take you to do the work yourself, consider getting rid of it.
</p>
<h3 id="439ea4466014446a9ddfc2e264c86fba">
Symptoms <a href="#439ea4466014446a9ddfc2e264c86fba">#</a>
</h3>
<p>
Perhaps the infamous <em>left-pad</em> incident is too easy an example, but it does highlight the essence of this tension. Do you really need a third-party package to pad a string, or could you have done it yourself?
</p>
<p>
You can spend much time figuring out how to fit a general-purpose library or framework to your particular needs. How do you make your object-relational mapper (ORM) fit a special database schema? How do you annotate a class so that it produces validation messages according to the requirements in your jurisdiction? How do you configure an automatic mapping library so that it correctly projects data? How do you tell a Dependency Injection (DI) Container how to compose a <a href="https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern">Chain of Responsibility</a> where some objects also take strings or integers in their constructors?
</p>
<p>
Do such libraries or frameworks save time, or could you have written the corresponding code quicker? To be clear, I'm not talking about writing your own ORM, your own DI Container, your own auto-mapper. Rather, instead of using a DI Container, <a href="/2014/06/10/pure-di">Pure DI</a> is likely easier. As an alternative to an ORM, what's the cost of just writing <a href="https://en.wikipedia.org/wiki/SQL">SQL</a>? Instead of an <a href="https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule">ad-hoc, informally-specified, bug-ridden</a> validation framework, have you considered <a href="/2018/11/05/applicative-validation">applicative validation</a>?
</p>
<p>
Things become really insidious if your chosen library never really solves all problems. Every time you figure out how to use it for one exotic corner case, your 'solution' causes a new problem to arise.
</p>
<p>
A symptom of <em>Dependency Whac-A-Mole</em> is when you have to advertise after people skilled in a particular technology.
</p>
<p>
Again, it's not necessarily a problem. If you're getting tremendous value out of, say, Entity Framework, it makes sense to list expertise as a job requirement. If, on the other hand, you have to list a litany of libraries and frameworks as necessary skills, it might pay to stop and reconsider. You can call it your 'tech stack' all you will, but is it really an inadvertent case of <a href="https://en.wikipedia.org/wiki/Vendor_lock-in">vendor lock-in</a>?
</p>
<h3 id="381db0b94f094be2be2b95841e248669">
Anecdotal evidence <a href="#381db0b94f094be2be2b95841e248669">#</a>
</h3>
<p>
I've used the term <em>Whac-A-Mole</em> a couple of times to describe the kind of situation where you feel that you're fighting a technology more than it's helping you. It seems to resonate with other people than me.
</p>
<p>
Here are the original articles where I used the term:
</p>
<ul>
<li><a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a></li>
<li><a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a></li>
<li><a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">Do ORMs reduce the need for mapping?</a></li>
</ul>
<p>
These are only the articles where I explicitly use the term. I do, however, think that the phenomenon is more common. I'm particularly sensitive to it when it comes to Dependency Injection, where I generally believe that DI Containers make the technique harder that it has to be. Composing object graphs is easily done with code.
</p>
<h3 id="2ef657b607cd49408ced7110e28e2321">
Conclusion <a href="#2ef657b607cd49408ced7110e28e2321">#</a>
</h3>
<p>
Sometimes a framework or library makes it more difficult to get things done. You spend much time kowtowing to its needs, researching how to do things 'the xyz way', learning its intricate extensibility points, keeping up to date with its evolving API, and engaging with its community to lobby for new features.
</p>
<p>
Still, you feel that it makes you compromise. You might have liked to organise your code in a different way, but unfortunately you can't, because it doesn't fit the way the dependency works. As you solve issues with it, new ones appear.
</p>
<p>
These are symptoms of <em>Dependency Whac-A-Mole</em>, an architecture smell that indicates that you're using the wrong tool for the job. If so, get rid of the dependency in favour of something better. Often, the better alternative is just plain vanilla code.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="9235995516070545f7cc3ee83d37023d">
<div class="comment-author"><a href="https://github.com/thomaslevesque">Thomas Levesque</a> <a href="#9235995516070545f7cc3ee83d37023d">#</a></div>
<div class="comment-content">
<p>
The most obvious example of this for me is definitely AutoMapper. I used to think it was great and saved so much time, but more often than not,
the mapping configuration ended up being more complex (and fragile) than just mapping the properties manually.
</p>
</div>
<div class="comment-date">2023-10-02 13:27 UTC</div>
</div>
<div class="comment" id="93b32bb03ee14d298b0d9b7cf65ddcae">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#93b32bb03ee14d298b0d9b7cf65ddcae">#</a></div>
<div class="comment-content">
<p>
I could imagine. AutoMapper is not, however, a library I've used enough to evaluate.
</p>
</div>
<div class="comment-date">2023-10-02 13:58 UTC</div>
</div>
<div class="comment" id="3e81ff9e535743148d8898e84ff69595">
<div class="comment-author"><a href="https://blog.oakular.xyz">Callum Warrilow</a> <a href="#3e81ff9e535743148d8898e84ff69595">#</a></div>
<div class="comment-content">
<p>
The moment I lost any faith in AutoMapper was after trying to debug a mapping that was silently failing on a single property.
Three of us were looking at it for a good amount of time before one of us noticed a single character typo on the destination property.
As the names did not match, no mapping occurred. It is unfortunately a black box, and obfuscated a problem that a manual mapping would have handled gracefully.
<hr />
Mark, it is interesting that you mention Gherkin as potentially one of these moles. It is something I've been evaluating in the hopes of making our tests more business focused,
but considering it again now, you can achieve a lot of what Gherkin offers with well defined namespaces, classes and methods in your test assemblies, something like:
<ul>
<li>Namespace: GivenSomePrecondition</li>
<li>TestClass: WhenCarryingOutAnAction</li>
<li>TestMethod: ThenTheExpectedPostConditionResults</li>
</ul>
To get away from playing Whac-a-Mole, it would seem to require changing the question being asked, from <i>what product do I need to solve this problem?</i>, to <i>what tools and patterns can do I have around me to solve this problem?</i>.
</p>
</div>
<div class="comment-date">2023-10-11 15:54 UTC</div>
</div>
<div class="comment" id="eef76159a60b4ee482238b1cd990ab94">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#eef76159a60b4ee482238b1cd990ab94">#</a></div>
<div class="comment-content">
<p>
Callum, I was expecting someone to comment on including Gherkin on the list.
</p>
<p>
I don't consider all my examples as universally problematic. Rather, they often pop up in contexts where people seem to be struggling with a concept or a piece of technology with no apparent benefit.
</p>
<p>
I'm sure that when <a href="https://dannorth.net/">Dan North</a> came up with the idea of BDD and Gherkin, he actually <em>used</em> it. When used in the way it was originally intended, I can see it providing value.
</p>
<p>
Apart from Dan himself, however, I'm not aware that I've ever met anyone who has used BDD and Gherkin in that way. On the contrary, I've had more than one discussion that went like this:
</p>
<p>
<em>Interlocutor:</em> "We use BDD and Gherkin. It's great! You should try it."
</p>
<p>
<em>Me:</em> "Why?"
</p>
<p>
<em>Interlocutor:</em> "It enables us to <em>organise</em> our tests."
</p>
<p>
<em>Me:</em> "Can't you do that with the <a href="https://wiki.c2.com/?ArrangeActAssert">AAA</a> pattern?"
</p>
<p>
<em>Interlocutor:</em> "..."
</p>
<p>
<em>Me:</em> "Do any non-programmers ever look at your tests?"
</p>
<p>
<em>Interlocutor:</em> "No..."
</p>
<p>
If only programmers look at the test code, then why impose an artificial constraint? <em>Given-when-then</em> is just <em>arrange-act-assert</em> with different names, but free of Gherkin and the tooling that typically comes with it, you're free to write test code that follows normal good coding practices.
</p>
<p>
(As an aside, yes: Sometimes <a href="https://www.dotnetrocks.com/?show=1542">constraints liberate</a>, but what I've seen of Gherkin-based test code, this doesn't seem to be one of those cases.)
</p>
<p>
Finally, to be quite clear, although I may be repeating myself: If you're using Gherkin to interact with non-programmers on a regular basis, it may be beneficial. I've just never been in that situation, or met anyone other than Dan North who have.
</p>
</div>
<div class="comment-date">2023-10-15 14:35 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
The case of the mysterious comparison
https://blog.ploeh.dk/2023/09/25/the-case-of-the-mysterious-comparison
2023-09-25T05:58:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A ploeh mystery.</em>
</p>
<p>
I was <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">recently playing around</a> with the example code from my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, refactoring the <code>Table</code> class to use <a href="/2022/08/22/can-types-replace-validation">a predicative NaturalNumber wrapper</a> to represent a table's seating capacity.
</p>
<p>
Originally, the <code>Table</code> constructor and corresponding read-only data looked like this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">bool</span> isStandard;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Reservation[] reservations;
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">private</span> <span style="color:#2b91af;">Table</span>(<span style="color:blue;">bool</span> <span style="font-weight:bold;color:#1f377f;">isStandard</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">params</span> Reservation[] <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:blue;">this</span>.isStandard = isStandard;
Capacity = capacity;
<span style="color:blue;">this</span>.reservations = reservations;
}</pre>
</p>
<p>
Since I wanted to show an example of how wrapper types can help make preconditions explicit, I changed it to this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">bool</span> isStandard;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Reservation[] reservations;
<span style="color:blue;">public</span> NaturalNumber Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">private</span> <span style="color:#2b91af;">Table</span>(<span style="color:blue;">bool</span> <span style="font-weight:bold;color:#1f377f;">isStandard</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">params</span> Reservation[] <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:blue;">this</span>.isStandard = isStandard;
Capacity = capacity;
<span style="color:blue;">this</span>.reservations = reservations;
}</pre>
</p>
<p>
The only thing I changed was the type of <code>Capacity</code> and <code>capacity</code>.
</p>
<p>
As I did that, two tests failed.
</p>
<h3 id="5942663d531c41c491e2b79116008c5e">
Evidence <a href="#5942663d531c41c491e2b79116008c5e">#</a>
</h3>
<p>
Both tests failed in the same way, so I only show one of the failures:
</p>
<p>
<pre>Ploeh.Samples.Restaurants.RestApi.Tests.MaitreDScheduleTests.Schedule
Source: MaitreDScheduleTests.cs line 16
Duration: 340 ms
Message:
FsCheck.Xunit.PropertyFailedException :
Falsifiable, after 2 tests (0 shrinks) (StdGen (48558275,297233133)):
Original:
<null>
(Ploeh.Samples.Restaurants.RestApi.MaitreD,
[|Ploeh.Samples.Restaurants.RestApi.Reservation|])
---- System.InvalidOperationException : Failed to compare two elements in the array.
-------- System.ArgumentException : At least one object must implement IComparable.
Stack Trace:
----- Inner Stack Trace -----
GenericArraySortHelper`1.Sort(T[] keys, Int32 index, Int32 length, IComparer`1 comparer)
Array.Sort[T](T[] array, Int32 index, Int32 length, IComparer`1 comparer)
EnumerableSorter`2.QuickSort(Int32[] keys, Int32 lo, Int32 hi)
EnumerableSorter`1.Sort(TElement[] elements, Int32 count)
OrderedEnumerable`1.ToList()
Enumerable.ToList[TSource](IEnumerable`1 source)
<span style="color: red;">MaitreD.Allocate(IEnumerable`1 reservations)</span> line 91
<span style="color: red;"><>c__DisplayClass21_0.<Schedule>b__4(<>f__AnonymousType7`2 <>h__TransparentIdentifier1)</span> line 114
<>c__DisplayClass2_0`3.<CombineSelectors>b__0(TSource x)
SelectIPartitionIterator`2.GetCount(Boolean onlyIfCheap)
Enumerable.Count[TSource](IEnumerable`1 source)
<span style="color: red;">MaitreDScheduleTests.ScheduleImp(MaitreD sut, Reservation[] reservations)</span> line 31
<span style="color: red;"><>c.<Schedule>b__0_2(ValueTuple`2 t)</span> line 22
ForAll@15.Invoke(Value arg00)
Testable.evaluate[a,b](FSharpFunc`2 body, a a)
----- Inner Stack Trace -----
Comparer.Compare(Object a, Object b)
ObjectComparer`1.Compare(T x, T y)
EnumerableSorter`2.CompareAnyKeys(Int32 index1, Int32 index2)
ComparisonComparer`1.Compare(T x, T y)
ArraySortHelper`1.SwapIfGreater(T[] keys, Comparison`1 comparer, Int32 a, Int32 b)
ArraySortHelper`1.IntroSort(T[] keys, Int32 lo, Int32 hi, Int32 depthLimit, Comparison`1 comparer)
GenericArraySortHelper`1.Sort(T[] keys, Int32 index, Int32 length, IComparer`1 comparer)</pre>
</p>
<p>
The code highlighted with red is user code (i.e. my code). The rest comes from .NET or <a href="https://fscheck.github.io/FsCheck/">FsCheck</a>.
</p>
<p>
While a stack trace like that can look intimidating, I usually navigate to the top stack frame of my own code. As I reproduce my investigation, see if you can spot the problem before I did.
</p>
<h3 id="fded951b0b3a4cac848941153e84eaa6">
Understand before resolving <a href="#fded951b0b3a4cac848941153e84eaa6">#</a>
</h3>
<p>
Before starting the investigation proper, we might as well acknowledge what seems evident. I had a fully passing test suite, then I edited two lines of code, which caused the above error. The two nested exception messages contain obvious clues: <em>Failed to compare two elements in the array,</em> and <em>At least one object must implement IComparable</em>.
</p>
<p>
The only edit I made was to change an <code>int</code> to a <code>NaturalNumber</code>, and <code>NaturalNumber</code> didn't implement <code>IComparable</code>. It seems straightforward to just make <code>NaturalNumber</code> implement that interface and move on, and as it turns out, that <em>is</em> the solution.
</p>
<p>
As I describe in <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, when troubleshooting, first seek to understand the problem. I've seen too many people go immediately into 'action mode' when faced with a problem. It's often a suboptimal strategy.
</p>
<p>
First, if the immediate solution turns out not to work, you can waste much time trashing, trying various 'fixes' without understanding the problem.
</p>
<p>
Second, even if the resolution is easy, as is the case here, if you don't understand the underlying cause and effect, you can easily build a <a href="https://en.wikipedia.org/wiki/Cargo_cult">cargo cult</a>-like 'understanding' of programming. This could become one such experience: <em>All wrapper types must implement <code>IComparable</code></em>, or some nonsense like that.
</p>
<p>
Unless people are getting hurt or you are bleeding money because of the error, seek first to understand, and only then fix the problem.
</p>
<h3 id="f881a7f048144dd2a2521e336675d052">
First clue <a href="#f881a7f048144dd2a2521e336675d052">#</a>
</h3>
<p>
The top user stack frame is the <code>Allocate</code> method:
</p>
<p>
<pre><span style="color:blue;">private</span> IEnumerable<Table> <span style="font-weight:bold;color:#74531f;">Allocate</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
List<Table> <span style="font-weight:bold;color:#1f377f;">allocation</span> = Tables.ToList();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">r</span> <span style="font-weight:bold;color:#8f08c4;">in</span> reservations)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = allocation.Find(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Fits(r.Quantity));
<span style="font-weight:bold;color:#8f08c4;">if</span> (table <span style="color:blue;">is</span> { })
{
allocation.Remove(table);
allocation.Add(table.Reserve(r));
}
}
<span style="font-weight:bold;color:#8f08c4;">return</span> allocation;
}</pre>
</p>
<p>
The stack trace points to line 91, which is the first line of code; where it calls <code>Tables.ToList()</code>. This is also consistent with the stack trace, which indicates that the exception is thrown from <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.tolist">ToList</a>.
</p>
<p>
I am, however, not used to <code>ToList</code> throwing exceptions, so I admit that I was nonplussed. Why would <code>ToList</code> try to sort the input? It usually doesn't do that.
</p>
<p>
Now, I <em>did</em> notice the <code>OrderedEnumerable`1</code> on the stack frame above <code>Enumerable.ToList</code>, but this early in the investigation, I failed to connect the dots.
</p>
<p>
What does the caller look like? It's that scary <code>DisplayClass21</code>...
</p>
<h3 id="5250f81716324ff1918bea2e57d08ef4">
Immediate caller <a href="#5250f81716324ff1918bea2e57d08ef4">#</a>
</h3>
<p>
The code that calls <code>Allocate</code> is the <code>Schedule</code> method, the System Under Test:
</p>
<p>
<pre><span style="color:blue;">public</span> IEnumerable<TimeSlot> <span style="font-weight:bold;color:#74531f;">Schedule</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span>
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> reservations
<span style="color:blue;">group</span> r <span style="color:blue;">by</span> r.At <span style="color:blue;">into</span> g
<span style="color:blue;">orderby</span> g.Key
<span style="color:blue;">let</span> seating = <span style="color:blue;">new</span> Seating(SeatingDuration, g.Key)
<span style="color:blue;">let</span> overlapping = reservations.Where(seating.Overlaps)
<span style="color:blue;">select</span> <span style="color:blue;">new</span> TimeSlot(g.Key, Allocate(overlapping).ToList());
}</pre>
</p>
<p>
While it does <code>orderby</code>, it doesn't seem to be sorting the input to <code>Allocate</code>. While <code>overlapping</code> is a filtered subset of <code>reservations</code>, the code doesn't sort <code>reservations</code>.
</p>
<p>
Okay, moving on, what does the caller of that method look like?
</p>
<h3 id="0db463ec75d64f93a1b188af9fe731f3">
Test implementation <a href="#0db463ec75d64f93a1b188af9fe731f3">#</a>
</h3>
<p>
The caller of the <code>Schedule</code> method is this test implementation:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ScheduleImp</span>(
MaitreD <span style="font-weight:bold;color:#1f377f;">sut</span>,
Reservation[] <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.Schedule(reservations);
Assert.Equal(
reservations.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => r.At).Distinct().Count(),
actual.Count());
Assert.Equal(
actual.Select(<span style="font-weight:bold;color:#1f377f;">ts</span> => ts.At).OrderBy(<span style="font-weight:bold;color:#1f377f;">d</span> => d),
actual.Select(<span style="font-weight:bold;color:#1f377f;">ts</span> => ts.At));
Assert.All(actual, <span style="font-weight:bold;color:#1f377f;">ts</span> => AssertTables(sut.Tables, ts.Tables));
Assert.All(
actual,
<span style="font-weight:bold;color:#1f377f;">ts</span> => AssertRelevance(reservations, sut.SeatingDuration, ts));
}</pre>
</p>
<p>
Notice how the first line of code calls <code>Schedule</code>, while the rest is 'just' assertions.
</p>
<p>
Because I had noticed that <code>OrderedEnumerable`1</code> on the stack, I was on the lookout for an expression that would sort an <code>IEnumerable<T></code>. The <code>ScheduleImp</code> method surprised me, though, because the <code>reservations</code> parameter is an array. If there was any problem sorting it, it should have blown up much earlier.
</p>
<p>
I really should be paying more attention, but despite my best resolution to proceed methodically, I was chasing the wrong clue.
</p>
<p>
Which line of code throws the exception? The stack trace says line 31. That's not the <code>sut.Schedule(reservations)</code> call. It's the first assertion following it. I failed to notice that.
</p>
<h3 id="5116edf953f1438b9cb4b37c5b043bda">
Property <a href="#5116edf953f1438b9cb4b37c5b043bda">#</a>
</h3>
<p>
I was stumped, and not knowing what to do, I looked at the fourth and final piece of user code in that stack trace:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">Schedule</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
(<span style="color:blue;">from</span> rs <span style="color:blue;">in</span> Gens.Reservations
<span style="color:blue;">from</span> m <span style="color:blue;">in</span> Gens.MaitreD(rs)
<span style="color:blue;">select</span> (m, rs)).ToArbitrary(),
<span style="font-weight:bold;color:#1f377f;">t</span> => ScheduleImp(t.m, t.rs));
}</pre>
</p>
<p>
No sorting there. What's going on?
</p>
<p>
In retrospect, I'm struggling to understand what was going on in my mind. Perhaps you're about to lose patience with me. I was chasing the wrong 'clue', just as I said above that 'other' people do, but surely, it's understood, that I don't.
</p>
<h3 id="70ad2e90704d4d31bc0d045fff16a011">
WYSIATI <a href="#70ad2e90704d4d31bc0d045fff16a011">#</a>
</h3>
<p>
In <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> I spend some time discussing how code relates to human cognition. I'm no neuroscientist, but I try to read books on other topics than programming. I was partially inspired by <a href="/ref/thinking-fast-and-slow">Thinking, Fast and Slow</a> in which <a href="https://en.wikipedia.org/wiki/Daniel_Kahneman">Daniel Kahneman</a> (among many other topics) presents how <em>System 1</em> (the inaccurate <em>fast</em> thinking process) mostly works with what's right in front of it: <em>What You See Is All There Is</em>, or WYSIATI.
</p>
<p>
That <code>OrderedEnumerable`1</code> in the stack trace had made me look for an <code>IEnumerable<T></code> as the culprit, and in the source code of the <code>Allocate</code> method, one parameter is clearly what I was looking for. I'll repeat that code here for your benefit:
</p>
<p>
<pre><span style="color:blue;">private</span> IEnumerable<Table> <span style="font-weight:bold;color:#74531f;">Allocate</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
List<Table> <span style="font-weight:bold;color:#1f377f;">allocation</span> = Tables.ToList();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">r</span> <span style="font-weight:bold;color:#8f08c4;">in</span> reservations)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = allocation.Find(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Fits(r.Quantity));
<span style="font-weight:bold;color:#8f08c4;">if</span> (table <span style="color:blue;">is</span> { })
{
allocation.Remove(table);
allocation.Add(table.Reserve(r));
}
}
<span style="font-weight:bold;color:#8f08c4;">return</span> allocation;
}</pre>
</p>
<p>
Where's the <code>IEnumerable<T></code> in that code?
</p>
<p>
<code>reservations</code>, right?
</p>
<h3 id="c99f8e1238284cc88868d5fe39f43f2a">
Revelation <a href="#c99f8e1238284cc88868d5fe39f43f2a">#</a>
</h3>
<p>
As WYSIATI 'predicts', the brain gloms on to what's prominent. I was looking for <code>IEnumerable<T></code>, and it's right there in the method declaration as the parameter <code>IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span></code>.
</p>
<p>
As covered in multiple places (<a href="/code-that-fits-in-your-head">my book</a>, <a href="/ref/programmers-brain">The Programmer's Brain</a>), the human brain has limited short-term memory. Apparently, while chasing the <code>IEnumerable<T></code> clue, I'd already managed to forget another important datum.
</p>
<p>
Which line of code throws the exception? This one:
</p>
<p>
<pre>List<Table> <span style="font-weight:bold;color:#1f377f;">allocation</span> = Tables.ToList();</pre>
</p>
<p>
The <code>IEnumerable<T></code> isn't <code>reservations</code>, but <code>Tables</code>.
</p>
<p>
While the code doesn't explicitly say <code>IEnumerable<Table> Tables</code>, that's just what it is.
</p>
<p>
Yes, it took me way too long to notice that I'd been barking up the wrong tree all along. Perhaps you immediately noticed that, but have pity with me. I don't think this kind of human error is uncommon.
</p>
<h3 id="288f57cb1a4648a1926164e64aebfbe2">
The culprit <a href="#288f57cb1a4648a1926164e64aebfbe2">#</a>
</h3>
<p>
Where do <code>Tables</code> come from? It's a read-only property originally injected via the constructor:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">MaitreD</span>(
TimeOfDay <span style="font-weight:bold;color:#1f377f;">opensAt</span>,
TimeOfDay <span style="font-weight:bold;color:#1f377f;">lastSeating</span>,
TimeSpan <span style="font-weight:bold;color:#1f377f;">seatingDuration</span>,
IEnumerable<Table> <span style="font-weight:bold;color:#1f377f;">tables</span>)
{
OpensAt = opensAt;
LastSeating = lastSeating;
SeatingDuration = seatingDuration;
Tables = tables;
}</pre>
</p>
<p>
Okay, in the test then, where does it come from? That's the <code>m</code> in the above property, repeated here for your convenience:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">Schedule</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
(<span style="color:blue;">from</span> rs <span style="color:blue;">in</span> Gens.Reservations
<span style="color:blue;">from</span> m <span style="color:blue;">in</span> Gens.MaitreD(rs)
<span style="color:blue;">select</span> (m, rs)).ToArbitrary(),
<span style="font-weight:bold;color:#1f377f;">t</span> => ScheduleImp(t.m, t.rs));
}</pre>
</p>
<p>
The <code>m</code> variable is generated by <code>Gens.MaitreD</code>, so let's follow that clue:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<MaitreD> <span style="color:#74531f;">MaitreD</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span>
<span style="color:blue;">from</span> seatingDuration <span style="color:blue;">in</span> Gen.Choose(1, 6)
<span style="color:blue;">from</span> tables <span style="color:blue;">in</span> Tables(reservations)
<span style="color:blue;">select</span> <span style="color:blue;">new</span> MaitreD(
TimeSpan.FromHours(18),
TimeSpan.FromHours(21),
TimeSpan.FromHours(seatingDuration),
tables);
}</pre>
</p>
<p>
We're not there yet, but close. The <code>tables</code> variable is generated by this <code>Tables</code> helper function:
</p>
<p>
<pre><span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> Generate a table configuration that can at minimum accomodate all</span>
<span style="color:gray;">///</span><span style="color:green;"> reservations.</span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"></</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">param</span> <span style="color:gray;">name</span><span style="color:gray;">=</span><span style="color:gray;">"</span>reservations<span style="color:gray;">"</span><span style="color:gray;">></span><span style="color:green;">The reservations to accommodate</span><span style="color:gray;"></</span><span style="color:gray;">param</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">returns</span><span style="color:gray;">></span><span style="color:green;">A generator of valid table configurations.</span><span style="color:gray;"></</span><span style="color:gray;">returns</span><span style="color:gray;">></span>
<span style="color:blue;">private</span> <span style="color:blue;">static</span> Gen<IEnumerable<Table>> <span style="color:#74531f;">Tables</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:green;">// Create a table for each reservation, to ensure that all</span>
<span style="color:green;">// reservations can be allotted a table.</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">tables</span> = reservations.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => Table.Standard(r.Quantity));
<span style="font-weight:bold;color:#8f08c4;">return</span>
<span style="color:blue;">from</span> moreTables <span style="color:blue;">in</span>
Gen.Choose(1, 12).Select(
<span style="font-weight:bold;color:#1f377f;">i</span> => Table.Standard(<span style="color:blue;">new</span> NaturalNumber(i))).ArrayOf()
<span style="color:blue;">let</span> allTables =
tables.Concat(moreTables).OrderBy(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity)
<span style="color:blue;">select</span> allTables.AsEnumerable();
}</pre>
</p>
<p>
And there you have it: <code>OrderBy(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity)</code>!
</p>
<p>
The <code>Capacity</code> property was exactly the property I changed from <code>int</code> to <code>NaturalNumber</code> - the change that made the test fail.
</p>
<p>
As expected, the fix was to let <code>NaturalNumber</code> implement <code>IComparable<NaturalNumber></code>.
</p>
<h3 id="534a507621b840dfb566cdb359261840">
Conclusion <a href="#534a507621b840dfb566cdb359261840">#</a>
</h3>
<p>
I thought this little troubleshooting session was interesting enough to write down. I spent perhaps twenty minutes on it before I understood what was going on. Not disastrously long, but enough time that I was relieved when I figured it out.
</p>
<p>
Apart from the obvious (look for the problem where it is), there is one other useful lesson to be learned, I think.
</p>
<p>
<a href="https://learn.microsoft.com/dotnet/standard/linq/deferred-execution-lazy-evaluation">Deferred execution</a> can confuse even the most experienced programmer. It took me some time before it dawned on me that even though the the <code>MaitreD</code> constructor had run and the object was 'safely' initialised, it actually wasn't.
</p>
<p>
The implication is that there's a 'disconnect' between the constructor and the <code>Allocate</code> method. The error actually happens during initialisation (i.e. in the caller of the constructor), but it only manifests when you run the method.
</p>
<p>
Ever since <a href="/2013/07/20/linq-versus-the-lsp">I discovered the IReadOnlyCollection<T> interface in 2013</a> I've resolved to favour it over <code>IEnumerable<T></code>. This is one example of why that's a good idea.
</p>
<p>
Despite my best intentions, I, too, cut corners from time to time. I've done it here, by accepting <code>IEnumerable<Table></code> instead of <code>IReadOnlyCollection<Table></code> as a constructor parameter. I really should have known better, and now I've paid the price.
</p>
<p>
This is particularly ironic because I also love <a href="https://www.haskell.org/">Haskell</a> so much. Haskell is lazy by default, so you'd think that I run into such issues all the time. An expression like <code>OrderBy(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity)</code>, however, wouldn't have compiled in Haskell unless the sort key implemented the <a href="https://hackage.haskell.org/package/base/docs/Data-Ord.html#t:Ord">Ord</a> type class. Even C#'s type system can express that a generic type must implement an interface, but <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.orderby">OrderBy</a> doesn't do that.
</p>
<p>
This problem could have been caught at compile-time, but unfortunately it wasn't.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="7207e4dc0287435facea31fc9ce49d36">
<div class="comment-author"><a href="https://github.com/JesHansen">Jes Hansen</a> <a href="#7207e4dc0287435facea31fc9ce49d36">#</a></div>
<div class="comment-content">
<p>
I made a <a href="https://github.com/dotnet/runtime/issues/92691">pull request</a> describing the issue.
</p>
<p>
As this is likely a breaking change I don't have high hopes for it to be fixed, though…
</p>
</div>
<div class="comment-date">2023-09-27 09:40 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Do ORMs reduce the need for mapping?
https://blog.ploeh.dk/2023/09/18/do-orms-reduce-the-need-for-mapping
2023-09-18T14:40:00+00:00
Mark Seemann
<div id="post">
<p>
<em>With some Entity Framework examples in C#.</em>
</p>
<p>
In a recent comment, a reader <a href="/2023/07/17/works-on-most-machines#4012c2cddcb64a068c0b06b7989a676e">asked me to expand on my position</a> on <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a> (ORMs), which is that I'm not a fan:
</p>
<blockquote>
<p>
I consider ORMs a waste of time: they create more problems than they solve.
</p>
<footer><cite><a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 12.2.2, footnote</cite></footer>
</blockquote>
<p>
While I acknowledge that only a Sith deals in absolutes, I favour clear assertions over guarded language. I don't really mean it that categorically, but I do stand by the general sentiment. In this article I'll attempt to describe why I don't reach for ORMs when querying or writing to a relational database.
</p>
<p>
As always, any exploration of such a kind is made in a <em>context</em>, and this article is no exception. Before proceeding, allow me to delineate the scope. If your context differs from mine, what I write may not apply to your situation.
</p>
<h3 id="a29a6dfd90604a358c5e2f8e76941f80">
Scope <a href="#a29a6dfd90604a358c5e2f8e76941f80">#</a>
</h3>
<p>
It's been decades since I last worked on a system where the database 'came first'. The last time that happened, the database was hidden behind an XML-based <a href="https://en.wikipedia.org/wiki/Remote_procedure_call">RPC</a> API that tunnelled through HTTP. Not a <a href="https://en.wikipedia.org/wiki/REST">REST</a> API by a long shot.
</p>
<p>
Since then, I've worked on various systems. Some used relational databases, some document databases, some worked with CSV, or really old legacy APIs, etc. Common to these systems was that they were <em>not</em> designed around a database. Rather, they were developed with an eye to the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>, keeping storage details out of the Domain Model. Many were developed with test-driven development (TDD).
</p>
<p>
When I evaluate whether or not to use an ORM in situations like these, the core application logic is my main design driver. As I describe in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I usually develop (vertical) feature slices one at a time, utilising an <a href="/outside-in-tdd">outside-in TDD</a> process, during which I also figure out how to save or retrieve data from persistent storage.
</p>
<p>
Thus, in systems like these, storage implementation is an artefact of the software architecture. If a relational database is involved, the schema must adhere to the needs of the code; not the other way around.
</p>
<p>
To be clear, then, this article doesn't discuss typical <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a>-heavy applications that are mostly forms over relational data, with little or no application logic. If you're working with such a code base, an ORM might be useful. I can't really tell, since I last worked with such systems at a time when ORMs didn't exist.
</p>
<h3 id="b6446ab3f8b8410da2679b4fb915a69e">
The usual suspects <a href="#b6446ab3f8b8410da2679b4fb915a69e">#</a>
</h3>
<p>
The most common criticism of ORMs (that I've come across) is typically related to the queries they generate. People who are skilled in writing <a href="https://en.wikipedia.org/wiki/SQL">SQL</a> by hand, or who are concerned about performance, may look at the SQL that an ORM generates and dislike it for that reason.
</p>
<p>
It's my impression that ORMs have come a long way over the decades, but frankly, the generated SQL is not really what concerns me. It never was.
</p>
<p>
In the abstract, Ted Neward already outlined the problems in the seminal article <a href="https://blogs.newardassociates.com/blog/2006/the-vietnam-of-computer-science.html">The Vietnam of Computer Science</a>. That problem description may, however, be too theoretical to connect with most programmers, so I'll try a more example-driven angle.
</p>
<h3 id="6908e1b735ee41068baeeb9482a15953">
Database operations without an ORM <a href="#6908e1b735ee41068baeeb9482a15953">#</a>
</h3>
<p>
Once more I turn to the trusty example code base that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>. In it, I used <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a> as the example database, and ADO.NET as the data access technology.
</p>
<p>
I considered this more than adequate for saving and reading restaurant reservations. Here, for example, is the code that creates a new reservation row in the database:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(reservation));
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">conn</span> = <span style="color:blue;">new</span> SqlConnection(ConnectionString);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> SqlCommand(createReservationSql, conn);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Id"</span>, reservation.Id);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@RestaurantId"</span>, restaurantId);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@At"</span>, reservation.At);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Name"</span>, reservation.Name.ToString());
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Email"</span>, reservation.Email.ToString());
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Quantity"</span>, reservation.Quantity);
<span style="color:blue;">await</span> conn.OpenAsync().ConfigureAwait(<span style="color:blue;">false</span>);
<span style="color:blue;">await</span> cmd.ExecuteNonQueryAsync().ConfigureAwait(<span style="color:blue;">false</span>);
}
<span style="color:blue;">private</span> <span style="color:blue;">const</span> <span style="color:blue;">string</span> createReservationSql = <span style="color:maroon;">@"
INSERT INTO [dbo].[Reservations] (
[PublicId], [RestaurantId], [At], [Name], [Email], [Quantity])
VALUES (@Id, @RestaurantId, @At, @Name, @Email, @Quantity)"</span>;</pre>
</p>
<p>
Yes, there's mapping, even if it's 'only' from a Domain Object to command parameter strings. As I'll argue later, if there's a way to escape such mapping, I'm not aware of it. ORMs don't seem to solve that problem.
</p>
<p>
This, however, seems to be the reader's main concern:
</p>
<blockquote>
<p>
"I can work with raw SQL ofcourse... but the mapping... oh the mapping..."
</p>
<footer><cite><a href="/2023/07/17/works-on-most-machines#4012c2cddcb64a068c0b06b7989a676e">qfilip</a></cite></footer>
</blockquote>
<p>
It's not a concern that I share, but again I'll remind you that if your context differs substantially from mine, what doesn't concern me could reasonably concern you.
</p>
<p>
You may argue that the above example isn't representative, since it only involves a single table. No foreign key relationships are involved, so perhaps the example is artificially easy.
</p>
<p>
In order to work with a slightly more complex schema, I decided to port the read-only in-memory restaurant database (the one that keeps track of the restaurants - the <em>tenants</em> - of the system) to SQL Server.
</p>
<h3 id="ef3d04206a20442dbd2c01336c48fd28">
Restaurants schema <a href="#ef3d04206a20442dbd2c01336c48fd28">#</a>
</h3>
<p>
In the book's sample code base, I'd only stored restaurant configurations as JSON config files, since I considered it out of scope to include an online tenant management system. Converting to a relational model wasn't hard, though. Here's the database schema:
</p>
<p>
<pre><span style="color:blue;">CREATE</span> <span style="color:blue;">TABLE</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:blue;"> </span><span style="color:gray;">(</span>
[Id] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[Name] <span style="color:blue;">NVARCHAR </span><span style="color:gray;">(</span>50<span style="color:gray;">)</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">UNIQUE</span><span style="color:gray;">,</span>
[OpensAt] <span style="color:blue;">TIME</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[LastSeating] <span style="color:blue;">TIME</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[SeatingDuration] <span style="color:blue;">TIME</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span>
<span style="color:blue;">PRIMARY</span> <span style="color:blue;">KEY</span> <span style="color:blue;">CLUSTERED </span><span style="color:gray;">(</span>[Id] <span style="color:blue;">ASC</span><span style="color:gray;">)</span>
<span style="color:gray;">)</span>
<span style="color:blue;">CREATE</span> <span style="color:blue;">TABLE</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>
[Id] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">IDENTITY</span><span style="color:gray;">,</span>
[RestaurantId] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">REFERENCES</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:gray;">(</span>Id<span style="color:gray;">),</span>
[Capacity] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[IsCommunal] <span style="color:blue;">BIT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span>
<span style="color:blue;">PRIMARY</span> <span style="color:blue;">KEY</span> <span style="color:blue;">CLUSTERED </span><span style="color:gray;">(</span>[Id] <span style="color:blue;">ASC</span><span style="color:gray;">)</span>
<span style="color:gray;">)</span></pre>
</p>
<p>
This little subsystem requires two database tables: One that keeps track of the overall restaurant configuration, such as name, opening and closing times, and another database table that lists all a restaurant's physical tables.
</p>
<p>
You may argue that this is still too simple to realistically capture the intricacies of existing database systems, but conversely I'll remind you that the scope of this article is the sort of system where you develop and design the application first; not a system where you're given a relational database upon which you must create an application.
</p>
<p>
Had I been given this assignment in a realistic setting, a relational database probably wouldn't have been my first choice. Some kind of document database, or even blob storage, strikes me as a better fit. Still, this article is about ORMs, so I'll pretend that there are external circumstances that dictate a relational database.
</p>
<p>
To test the system, I also created a script to populate these tables. Here's part of it:
</p>
<p>
<pre><span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:blue;"> </span><span style="color:gray;">(</span>[Id]<span style="color:gray;">,</span> [Name]<span style="color:gray;">,</span> [OpensAt]<span style="color:gray;">,</span> [LastSeating]<span style="color:gray;">,</span> [SeatingDuration]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>1<span style="color:gray;">,</span> <span style="color:red;">N'Hipgnosta'</span><span style="color:gray;">,</span> <span style="color:red;">'18:00'</span><span style="color:gray;">,</span> <span style="color:red;">'21:00'</span><span style="color:gray;">,</span> <span style="color:red;">'6:00'</span><span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>1<span style="color:gray;">,</span> 10<span style="color:gray;">,</span> 1<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:blue;"> </span><span style="color:gray;">(</span>[Id]<span style="color:gray;">,</span> [Name]<span style="color:gray;">,</span> [OpensAt]<span style="color:gray;">,</span> [LastSeating]<span style="color:gray;">,</span> [SeatingDuration]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> <span style="color:red;">N'Nono'</span><span style="color:gray;">,</span> <span style="color:red;">'18:00'</span><span style="color:gray;">,</span> <span style="color:red;">'21:00'</span><span style="color:gray;">,</span> <span style="color:red;">'6:00'</span><span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 6<span style="color:gray;">,</span> 1<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 4<span style="color:gray;">,</span> 1<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 2<span style="color:gray;">,</span> 0<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 2<span style="color:gray;">,</span> 0<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 4<span style="color:gray;">,</span> 0<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 4<span style="color:gray;">,</span> 0<span style="color:gray;">)</span></pre>
</p>
<p>
There are more rows than this, but this should give you an idea of what data looks like.
</p>
<h3 id="ba5d810c332945398ab2a870711357f1">
Reading restaurant data without an ORM <a href="#ba5d810c332945398ab2a870711357f1">#</a>
</h3>
<p>
Due to the foreign key relationship, reading restaurant data from the database is a little more involved than reading from a single table.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<Restaurant?> <span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> SqlCommand(readByNameSql);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Name"</span>, name);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurants</span> = <span style="color:blue;">await</span> ReadRestaurants(cmd);
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurants.SingleOrDefault();
}
<span style="color:blue;">private</span> <span style="color:blue;">const</span> <span style="color:blue;">string</span> readByNameSql = <span style="color:maroon;">@"
SELECT [Id], [Name], [OpensAt], [LastSeating], [SeatingDuration]
FROM [dbo].[Restaurants]
WHERE [Name] = @Name
SELECT [RestaurantId], [Capacity], [IsCommunal]
FROM [dbo].[Tables]
JOIN [dbo].[Restaurants]
ON [dbo].[Tables].[RestaurantId] = [dbo].[Restaurants].[Id]
WHERE [Name] = @Name"</span>;</pre>
</p>
<p>
There are more than one option when deciding how to construct the query. You could make one query with a join, in which case you'd get rows with repeated data, and you'd then need to detect duplicates, or you could do as I've done here: Query each table to get multiple result sets.
</p>
<p>
I'm not claiming that this is better in any way. I only chose this option because I found the code that I had to write less offensive.
</p>
<p>
Since the <code>IRestaurantDatabase</code> interface defines three different kinds of queries (<code>GetAll()</code>, <code>GetRestaurant(int id)</code>, and <code>GetRestaurant(string name)</code>), I invoked the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">rule of three</a> and extracted a helper method:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">async</span> Task<IEnumerable<Restaurant>> <span style="font-weight:bold;color:#74531f;">ReadRestaurants</span>(SqlCommand <span style="font-weight:bold;color:#1f377f;">cmd</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">conn</span> = <span style="color:blue;">new</span> SqlConnection(ConnectionString);
cmd.Connection = conn;
<span style="color:blue;">await</span> conn.OpenAsync();
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rdr</span> = <span style="color:blue;">await</span> cmd.ExecuteReaderAsync();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurants</span> = Enumerable.Empty<Restaurant>();
<span style="font-weight:bold;color:#8f08c4;">while</span> (<span style="color:blue;">await</span> rdr.ReadAsync())
restaurants = restaurants.Append(ReadRestaurantRow(rdr));
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">await</span> rdr.NextResultAsync())
<span style="font-weight:bold;color:#8f08c4;">while</span> (<span style="color:blue;">await</span> rdr.ReadAsync())
restaurants = ReadTableRow(rdr, restaurants);
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurants;
}</pre>
</p>
<p>
The <code>ReadRestaurants</code> method does the overall work of opening the database connection, executing the query, and moving through rows and result sets. Again, we'll find mapping code hidden in helper methods:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurant <span style="color:#74531f;">ReadRestaurantRow</span>(SqlDataReader <span style="font-weight:bold;color:#1f377f;">rdr</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Restaurant(
(<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"Id"</span>],
(<span style="color:blue;">string</span>)rdr[<span style="color:#a31515;">"Name"</span>],
<span style="color:blue;">new</span> MaitreD(
<span style="color:blue;">new</span> TimeOfDay((TimeSpan)rdr[<span style="color:#a31515;">"OpensAt"</span>]),
<span style="color:blue;">new</span> TimeOfDay((TimeSpan)rdr[<span style="color:#a31515;">"LastSeating"</span>]),
(TimeSpan)rdr[<span style="color:#a31515;">"SeatingDuration"</span>]));
}</pre>
</p>
<p>
As the name suggests, <code>ReadRestaurantRow</code> reads a row from the <code>Restaurants</code> table and converts it into a <code>Restaurant</code> object. At this time, however, it creates each <code>MaitreD</code> object without any tables. This is possible because one of the <code>MaitreD</code> constructors takes a <code>params</code> array as the last parameter:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">MaitreD</span>(
TimeOfDay <span style="font-weight:bold;color:#1f377f;">opensAt</span>,
TimeOfDay <span style="font-weight:bold;color:#1f377f;">lastSeating</span>,
TimeSpan <span style="font-weight:bold;color:#1f377f;">seatingDuration</span>,
<span style="color:blue;">params</span> Table[] <span style="font-weight:bold;color:#1f377f;">tables</span>) :
<span style="color:blue;">this</span>(opensAt, lastSeating, seatingDuration, tables.AsEnumerable())
{
}</pre>
</p>
<p>
Only when the <code>ReadRestaurants</code> method moves on to the next result set can it add tables to each restaurant:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<Restaurant> <span style="color:#74531f;">ReadTableRow</span>(
SqlDataReader <span style="font-weight:bold;color:#1f377f;">rdr</span>,
IEnumerable<Restaurant> <span style="font-weight:bold;color:#1f377f;">restaurants</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span> = (<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"RestaurantId"</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = (<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"Capacity"</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">isCommunal</span> = (<span style="color:blue;">bool</span>)rdr[<span style="color:#a31515;">"IsCommunal"</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = isCommunal ? Table.Communal(capacity) : Table.Standard(capacity);
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurants.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Id == restaurantId ? AddTable(r, table) : r);
}</pre>
</p>
<p>
As was also the case in <code>ReadRestaurantRow</code>, this method uses string-based indexers on the <code>rdr</code> to extract the data. I'm no fan of stringly-typed code, but at least I have automated tests that exercise these methods.
</p>
<p>
Could an ORM help by creating strongly-typed classes that model database tables? To a degree; I'll discuss that later.
</p>
<p>
In any case, since the entire code base follows the <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a> architecture, the entire Domain Model is made of immutable data types with pure functions. Thus, <code>ReadTableRow</code> has to iterate over all <code>restaurants</code> and add the table when the <code>Id</code> matches. <code>AddTable</code> does that:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurant <span style="color:#74531f;">AddTable</span>(Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>, Table <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurant.Select(<span style="font-weight:bold;color:#1f377f;">m</span> => m.WithTables(m.Tables.Append(table).ToArray()));
}</pre>
</p>
<p>
I can think of other ways to solve the overall mapping task when using ADO.NET, but this was what made most sense to me.
</p>
<h3 id="3eda5ebf7165478698c756d59af43a1e">
Reading restaurants with Entity Framework <a href="#3eda5ebf7165478698c756d59af43a1e">#</a>
</h3>
<p>
Does an ORM like <a href="https://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a> (EF) improve things? To a degree, but not enough to outweigh the disadvantages it also brings.
</p>
<p>
In order to investigate, I followed <a href="https://learn.microsoft.com/ef/core/managing-schemas/scaffolding">the EF documentation to scaffold</a> code from a database I'd set up for only that purpose. For the <code>Tables</code> table it created the following <code>Table</code> class and a similar <code>Restaurant</code> class.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">partial</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Id { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RestaurantId { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> IsCommunal { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> Restaurant Restaurant { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; } = <span style="color:blue;">null</span>!;
}</pre>
</p>
<p>
Hardly surprising. Also, hardly object-oriented, but more about that later, too.
</p>
<p>
Entity Framework didn't, by itself, add a <code>Tables</code> collection to the <code>Restaurant</code> class, so I had to do that by hand, as well as modify the <a href="https://learn.microsoft.com/dotnet/api/microsoft.entityframeworkcore.dbcontext">DbContext</a>-derived class to tell it about this relationship:
</p>
<p>
<pre>entity.OwnsMany(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Tables, <span style="font-weight:bold;color:#1f377f;">b</span> =>
{
b.Property<<span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Id).ValueGeneratedOnAdd();
b.HasKey(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Id);
});</pre>
</p>
<p>
I thought that such a simple foreign key relationship would be something an ORM would help with, but apparently not.
</p>
<p>
With that in place, I could now rewrite the above <code>GetRestaurant</code> method to use Entity Framework instead of ADO.NET:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<Restaurants.Restaurant?> <span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> RestaurantsContext(ConnectionString);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dbRestaurant</span> = <span style="color:blue;">await</span> db.Restaurants.FirstOrDefaultAsync(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name == name);
<span style="font-weight:bold;color:#8f08c4;">if</span> (dbRestaurant == <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> ToDomainModel(dbRestaurant);
}</pre>
</p>
<p>
The method now queries the database, and EF automatically returns a populated object. This would be nice if it was the right kind of object, but alas, it isn't. <code>GetRestaurant</code> still has to call a helper method to convert to the correct Domain Object:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurants.Restaurant <span style="color:#74531f;">ToDomainModel</span>(Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Restaurants.Restaurant(
restaurant.Id,
restaurant.Name,
<span style="color:blue;">new</span> MaitreD(
<span style="color:blue;">new</span> TimeOfDay(restaurant.OpensAt),
<span style="color:blue;">new</span> TimeOfDay(restaurant.LastSeating),
restaurant.SeatingDuration,
restaurant.Tables.Select(ToDomainModel).ToList()));
}</pre>
</p>
<p>
While this helper method converts an EF <code>Restaurant</code> object to a proper Domain Object (<code>Restaurants.Restaurant</code>), it also needs another helper to convert the <code>table</code> objects:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurants.Table <span style="color:#74531f;">ToDomainModel</span>(Table <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (table.IsCommunal)
<span style="font-weight:bold;color:#8f08c4;">return</span> Restaurants.Table.Communal(table.Capacity);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> Restaurants.Table.Standard(table.Capacity);
}</pre>
</p>
<p>
As should be clear by now, using vanilla EF doesn't reduce the need for mapping.
</p>
<p>
Granted, the mapping code is a bit simpler, but you still need to remember to map <code>restaurant.Name</code> to the right constructor parameter, <code>restaurant.OpensAt</code> and <code>restaurant.LastSeating</code> to <em>their</em> correct places, <code>table.Capacity</code> to a constructor argument, and so on. If you make changes to the database schema or the Domain Model, you'll need to edit this code.
</p>
<h3 id="b85eb83e5acc4d66baaf9ec51c3be02d">
Encapsulation <a href="#b85eb83e5acc4d66baaf9ec51c3be02d">#</a>
</h3>
<p>
This is the point where more than one reader wonders: <em>Can't you just..?</em>
</p>
<p>
In short, no, I can't just.
</p>
<p>
The most common reaction is most likely that I'm doing this all wrong. I'm supposed to use the EF classes as my Domain Model.
</p>
<p>
But I can't, and I won't. I can't because I already have classes in place that serve that purpose. I also will not, because it would violate the Dependency Inversion Principle. As I recently described, <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">the architecture is Ports and Adapters</a>, or, if you will, <a href="/ref/clean-architecture">Clean Architecture</a>. The database <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapter</a> should depend on the Domain Model; the Domain Model shouldn't depend on the database implementation.
</p>
<p>
Okay, but couldn't I have generated the EF classes in the Domain Model? After all, a class like the above <code>Table</code> is just a <a href="https://en.wikipedia.org/wiki/Plain_old_CLR_object">POCO</a> Entity. It doesn't depend on the Entity Framework. I could have those classes in my Domain Model, put my <code>DbContext</code> in the data access layer, and have the best of both worlds. Right?
</p>
<p>
The code shown so far hints at a particular API afforded by the Domain Model. If you've read <a href="/code-that-fits-in-your-head">my book</a>, you already know what comes next. Here's the <code>Table</code> Domain Model's API:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Standard</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>)
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Communal</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>)
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RemainingSeats { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Table <span style="font-weight:bold;color:#74531f;">Reserve</span>(Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
<span style="color:blue;">public</span> T <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(ITableVisitor<T> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
}</pre>
</p>
<p>
A couple of qualities of this design should be striking: There's <em>no</em> visible constructor - not even one that takes parameters. Instead, the type affords two static creation functions. One creates a standard table, the other a communal table. My book describes the difference between these types, and so does <a href="/2020/01/27/the-maitre-d-kata">the Maître d' kata</a>.
</p>
<p>
This isn't some frivolous design choice of mine, but rather quite deliberate. That <code>Table</code> class is a <a href="/2018/06/25/visitor-as-a-sum-type">Visitor-encoded sum type</a>. You can debate whether I should have modelled a table as a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> or a polymorphic object, but now that I've chosen a sum type, it should be explicit in the API design.
</p>
<blockquote>
<p>
"Explicit is better than implicit."
</p>
<footer><cite><a href="https://peps.python.org/pep-0020/">The Zen of Python</a></cite></footer>
</blockquote>
<p>
When we program, we make many mistakes. It's important to discover the mistakes as soon as possible. With a compiled language, <a href="/2011/04/29/Feedbackmechanismsandtradeoffs">the first feedback you get is from the compiler</a>. I favour leveraging the compiler, and its type system, to prevent as many mistakes as possible. That's what <a href="https://buttondown.email/hillelwayne/archive/making-illegal-states-unrepresentable/">Hillel Wayne calls <em>constructive</em> data</a>. <a href="https://blog.janestreet.com/effective-ml-video/">Make illegal states unrepresentable</a>.
</p>
<p>
I could, had I thought of it at the time, have introduced <a href="/2022/08/22/can-types-replace-validation">a predicative natural-number wrapper of integers</a>, in which case I could have strengthened the contract of <code>Table</code> even further:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Standard</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>)
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Communal</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>)
<span style="color:blue;">public</span> NaturalNumber Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RemainingSeats { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Table <span style="font-weight:bold;color:#74531f;">Reserve</span>(Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
<span style="color:blue;">public</span> T <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(ITableVisitor<T> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
}</pre>
</p>
<p>
The point is that I take <a href="/encapsulation-and-solid">encapsulation</a> seriously, and my interpretation of the concept is heavily inspired by <a href="https://en.wikipedia.org/wiki/Bertrand_Meyer">Bertrand Meyer</a>'s <a href="/ref/oosc">Object-Oriented Software Construction</a>. The view of encapsulation emphasises <em>contracts</em> (preconditions, invariants, postconditions) rather than information hiding.
</p>
<p>
As I described in a previous article, <a href="/2022/08/22/can-types-replace-validation">you can't model all preconditions and invariants with types</a>, but you can still let the type system do much heavy lifting.
</p>
<p>
This principle applies to all classes that are part of the Domain Model; not only <code>Table</code>, but also <code>Restaurant</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Restaurant</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">Restaurant</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>, MaitreD <span style="font-weight:bold;color:#1f377f;">maitreD</span>)
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Id { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span> Name { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> MaitreD MaitreD { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">WithId</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">newId</span>)
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">WithName</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">newName</span>)
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">WithMaitreD</span>(MaitreD <span style="font-weight:bold;color:#1f377f;">newMaitreD</span>)
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">Select</span>(Func<MaitreD, MaitreD> <span style="font-weight:bold;color:#1f377f;">selector</span>)
}</pre>
</p>
<p>
While this class does have a public constructor, it makes use of another design choice that Entity Framework doesn't support: It nests one rich object (<code>MaitreD</code>) inside another. Why does it do that?
</p>
<p>
Again, this is far from a frivolous design choice I made just to be difficult. Rather, it's a result of a need-to-know principle (which strikes me as closely related to the <a href="https://en.wikipedia.org/wiki/Single-responsibility_principle">Single Responsibility Principle</a>): A class should only contain the information it needs in order to perform its job.
</p>
<p>
The <code>MaitreD</code> class does all the heavy lifting when it comes to deciding whether or not to accept reservations, how to allocate tables, etc. It doesn't, however, need to know the <code>id</code> or <code>name</code> of the restaurant in order to do that. Keeping that information out of <code>MaitreD</code>, and instead in the <code>Restaurant</code> wrapper, makes the code simpler and easier to use.
</p>
<p>
The bottom line of all this is that I value encapsulation over 'easy' database mapping.
</p>
<h3 id="3cfc364c9a87463aaac98bc358d062d5">
Limitations of Entity Framework <a href="#3cfc364c9a87463aaac98bc358d062d5">#</a>
</h3>
<p>
The promise of an object-relational mapper is that it automates mapping between objects and database. Is that promise realised?
</p>
<p>
In its current incarnation, <a href="https://stackoverflow.com/q/77039584/126014">it doesn't look as though Entity Framework supports mapping to and from the Domain Model</a>. With the above tweaks, it supports the database schema that I've described, but only via 'Entity classes'. I still have to map to and from the 'Entity objects' and the actual Domain Model. Not much is gained.
</p>
<p>
One should, of course, be careful not drawing too strong inferences from this example. First, proving anything impossible is generally difficult. Just because <em>I</em> can't find a way to do what I want, I can't conclude that it's impossible. That a few other people tell me, too, that it's impossible still doesn't constitute strong evidence.
</p>
<p>
Second, even if it's impossible today, it doesn't follow that it will be impossible forever. Perhaps Entity Framework will support my Domain Model in the future.
</p>
<p>
Third, we can't conclude that just because Entity Framework (currently) doesn't support my Domain Model it follows that no object-relational mapper (ORM) does. There might be another ORM out there that perfectly supports my design, but I'm just not aware of it.
</p>
<p>
Based on my experience and what I see, read, and hear, I don't think any of that likely. Things might change, though.
</p>
<h3 id="4002bb8f9ec64b14958233b16b93ec10">
Net benefit or drawback? <a href="#4002bb8f9ec64b14958233b16b93ec10">#</a>
</h3>
<p>
Perhaps, despite all of this, you still prefer ORMs. You may compare my ADO.NET code to my Entity Framework code and conclude that the EF code still looks simpler. After all, when using ADO.NET I have to jump through some hoops to load the correct tables associated with each restaurant, whereas EF automatically handles that for me. The EF version requires fewer lines of code.
</p>
<p>
In isolation, the fewer lines of code the better. This seems like an argument for using an ORM after all, even if the original promise remains elusive. Take what you can get.
</p>
<p>
On the other hand, when you take on a dependency, there's usually a cost that comes along. A library like Entity Framework isn't free. While you don't pay a licence fee for it, it comes with other costs. You have to learn how to use it, and so do your colleagues. You also have to keep up to date with changes.
</p>
<p>
Every time some exotic requirement comes up, you'll have to spend time investigating how to address it with that ORM's API. This may lead to a game of <a href="https://en.wikipedia.org/wiki/Whac-A-Mole">Whac-A-Mole</a> where every tweak to the ORM leads you further down the rabbit hole, and couples your code tighter with it.
</p>
<p>
You can only keep up with so many things. What's the best investment of your time, and the time of your team mates? Learning and knowing <a href="https://en.wikipedia.org/wiki/SQL">SQL</a>, or learning and keeping up to date with a particular ORM?
</p>
<p>
I learned SQL decades ago, and that knowledge is still useful. On the other hand, I don't even know how many library and framework APIs that I've both learned and forgotten about.
</p>
<p>
As things currently stand, it looks to me as though the net benefit of using a library like Entity Framework is negative. Yes, it might save me a few lines of code, but I'm not ready to pay the costs just outlined.
</p>
<p>
This balance could tip in the future, or my context may change.
</p>
<h3 id="e76ec6ffdf3f44949bab3d86786b26ae">
Conclusion <a href="#e76ec6ffdf3f44949bab3d86786b26ae">#</a>
</h3>
<p>
For the kind of applications that I tend to become involved with, I don't find object-relational mappers particularly useful. When you have a rich Domain Model where the first design priority is encapsulation, assisted by the type system, it looks as though mapping is unavoidable.
</p>
<p>
While you can ask automated tools to generate code that mirrors a database schema (or the other way around), only classes with poor encapsulation are supported. As soon as you do something out of the ordinary like static factory methods or nested objects, apparently Entity Framework gives up.
</p>
<p>
Can we extrapolate from Entity Framework to other ORMs? Can we infer that Entity Framework will never be able to support objects with proper encapsulation, just because it currently doesn't?
</p>
<p>
I can't say, but I'd be surprised if things change soon, if at all. If, on the other hand, it eventually turns out that I can have my cake and eat it too, then why shouldn't I?
</p>
<p>
Until then, however, I don't find that the benefits of ORMs trump the costs of using them.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="75ca5755d2a4445ba4836fc3f6922a5c">
<div class="comment-author">Vlad <a href="#75ca5755d2a4445ba4836fc3f6922a5c">#</a></div>
<div class="comment-content">
<p>
One project I worked on was (among other things) mapping data from database to rich domain objects in the way similar to what is described
in this article. These object knew how to do a lot of things but were dependant on related objects and so everything neded to be loaded in advance from the database in order to ensure correctness.
So having a Order, OrderLine, Person, Address and City, all the rows needed to be loaded in advance, mapped to objects and references set to create the object graph to be able to, say, display shipping
costs based on person's address.
</p>
<p>
The mapping step involved cumbersome manual coding and was error prone because it was easy to forget to load some list or set some reference. Reflecting on that experience, it seems to
me that sacrificing a bit of purity wrt domain modelling and leaning on an ORM to
lazily load the references would have been much more efficient <strong>and</strong> correct.
</p>
<p>
But I guess it all depends on the context..?
</p>
</div>
<div class="comment-date">2023-09-19 13:17 UTC</div>
</div>
<div class="comment" id="58265df2f91c434696a3e0d21fe1b3b1">
<div class="comment-author">qfilip <a href="#58265df2f91c434696a3e0d21fe1b3b1">#</a></div>
<div class="comment-content">
<!-- comment here -->
<p>
Thanks. I've been following recent posts, but I was too lazy to go through the whole PRing things to reply. Maybe that's a good thing, since it forces you to think how to reply, instead of throwing a bunch of words together quickly. Anyways, back to business.
</p>
<p>
I'm not trying to sell one way or the other, because I'm seriously conflicted with both. Since most people on the web tend to fall into ORM category (in .NET world at least), I was simply looking for other perspective, from someone more knowledgable than me.
</p>
<p>
The following is just my thinking out loud...
</p>
<p>
You've used DB-first approach and scaffolding classes from DB schema. With EF core, the usual thing to do, is the opposite. Write classes to scaffold DB schema. Now, this doesn't save us from writing those "relational properties", but it allows us to generate DB update scripts. So if you have a class like:
<pre>
class SomeTable
{
public int Id;
public string Name;
}
</pre>
and you add a field:
<pre>
class SomeTable
{
public int Id;
public string Name;
public DateTime Birthday;
}
</pre>
you can run
<pre>
add-migration MyMigration // generate migration file
update-database // execute it
</pre>
</p>
<p>
This gives you a nice way to track DB chages via Git, but it can also introduce conflicts. Two devs cannot edit the same class/table. You have to be really careful when making commits. Another painful thing to do this way is creating DB views and stored procedures. I've honestly never saw a good solution for it. Maybe trying to do these things is a futile effort in the first place.
</p>
<p>
The whole
<pre>
readByNameSql = @"SELECT [Id], [Name], [OpensAt], [LastSeating], [SeatingDuration]...
</pre>
is giving me heebie jeebies. It is easy to change some column name, and introduce a bug. It might be possible to do stuff with string interpolation, but at that point, I'm thinking about creating my own framework...
</p>
<p>
<blockquote>
The most common reaction is most likely that I'm doing this all wrong. I'm supposed to use the EF classes as my Domain Model. - Mark Seemann
</blockquote>
One of the first things that I was taught on my first job, was to never expose my domain model to the outside world. The domain model being EF Core classes... These days, I'm thinking quite the opposite. EF Core classes are DTOs for the database (with some boilerplate in order for framework to do it's magic). I also <b>want</b> to expose my domain model to the outside world. Why not? That's the contract after all. But the problem with this, is that it adds another layer of mapping. Since my domain model validation is done in class constructor, deserialization of requests becomes a problem. Ideally, it should sit in a static method. But in that case I have: jsonDto -> domainModel -> dbDto. The No-ORM approach also still requires me to map domainModel to command parameters manually. All of this is a tedious, and very error prone process. Especially if you have the case like <a href="#75ca5755d2a4445ba4836fc3f6922a5c">vlad</a> mentioned above.
</p>
<p>
Minor observation on your code. People rarely map things from DB data to domain models when using EF Core. This is a horrible thing to do. Anyone can run a script against a DB, and corrupt the data. It is something I intend to enforce in future projects, if possible. Thank you F# community.
</p>
<p>
I can't think of anything more to say at the moment. Thanks again for a fully-fledged-article reply :). I also recommend <a href="https://www.youtube.com/watch?v=ZYfdjszs8sU">this video</a>. I haven't had the time to try things he is talking about yet.
</p>
</div>
<div class="comment-date">2023-09-21 19:27 UTC</div>
</div>
<div class="comment" id="b3a8702fe15b416fa20f78d1351c8ca4">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#b3a8702fe15b416fa20f78d1351c8ca4">#</a></div>
<div class="comment-content">
<p>
Vlad, qfilip, thank you for writing.
</p>
<p>
I think your comments warrant another article. I'll post an update here later.
</p>
</div>
<div class="comment-date">2023-09-24 15:57 UTC</div>
</div>
<div class="comment" id="359a7bb0d2c14b8eb2dcb2ac6de4897d">
<div class="comment-author">qfilip <a href="#359a7bb0d2c14b8eb2dcb2ac6de4897d">#</a></div>
<div class="comment-content">
<p>
Quick update from me again. I've been thinking and experimenting with several approaches to solve issues I've written about above. How idealized world works and where do we make compromises. Forgive me for the sample types, I couldn't think of anything else. Let's assume we have this table:
<pre>
type Sheikh = {
// db entity data
Id: Guid
CreatedAt: DateTime
ModifiedAt: DateTime
Deleted: bool
// domain data
Name: string
Email: string // unique constraint here
// relational data
Wives: Wife list
Supercars: Supercar list
}
</pre>
</p>
<p>
I've named first 3 fields as "entity data". Why would my domain model contain an ID? It shouldn't care about persistence. I may save it to the DB, write it to a text file, or print it on a piece of paper. Don't care. We put IDs because data usually ends up in a DB. I could have used Email here to serve as an ID, because it should be unique, but we also like to standardize these stuff. All IDs shall be uuids.
</p>
<p>
There are also these "CreatedAt", "ModifiedAt" and "Deleted" columns. This is something I usually do, when I want soft-delete functionality. Denoramalize the data to gain performance. Otherwise, I would need to make... say... EntityStatus table to keep that data, forcing me to do a JOIN for every read operation and additional UPDATE EntityStatus for every write operation. So I kinda sidestep "the good practices" to avoid very real complications.
</p>
<p>
Domain data part is what it is, so I can safely skip that part.
</p>
<p>
Relational data part is the most interesting bit. I think this is what keeps me falling back to EntityFramework and why using "relational properties" are unavoidable. Either that, or I'm missing something.
</p>
<p>
Focusing attention on <code>Sheikh</code> table here, with just 2 relations, there are 4 potential scenarios. I don't want to load stuff from the DB, unless they are required, so the scenarios are:
<ul>
<li>Load <code>Sheikh</code> without relational data</li>
<li>Load <code>Sheikh</code> with <code>Wives</code></li>
<li>Load <code>Sheikh</code> with <code>Supercars</code></li>
<li>Load <code>Sheikh</code> with <code>Wives</code> and <code>Supercars</code></li>
</ul>
</p>
<p>
2<sup>NRelations</sup> I guess? I'm three beers in on this, with only six hours left until corporate clock starts ticking, so my math is probably off.
</p>
<p>
God forbid if any of these relations have their own "inner relations" you may or may not need to load. This is where the (magic mapping/to SQL translations) really start getting useful. There will be some code repetition, but you'll just need to add <code>ThenInclude(x => ...)</code> and you're done.
</p>
<p>
Now the flip side. Focusing attention on <code>Supercar</code> table:
<pre>
type Supercar = {
// db entity data
...
// domain data
Vendor: string
Model: string
HorsePower: int
// relational data
Owner: Sheikh
OwnerId: Guid
}
</pre>
</p>
<p>
Pretty much same as before. Sometimes I'll need <code>Sheikh</code> info, sometimes I won't. One of F# specific problems I'm having is that, records require all fields to be populated. What if I need just SheikhID to perform some domain logic?
<pre>
let tuneSheikhCars (sheikhId) (hpIncrement) (cars) =
cars
|> List.filter (fun x -> x.Owner.Id = sheikhId)
|> List.map (fun x -> x with { HorsePower = x.HorsePower + hpIncrement })
</pre>
</p>
<p>
Similar goes for inserting new <code>Supercar</code>. I want to query-check first if <code>Owner/Sheikh</code> exists, before attempting insertion. You can pass it as a separate parameter, but code gets messier and messier.
</p>
<p>
No matter how I twist and turn things around, in the real world, I'm not only concerned by current steps I need to take to complete a task, but also with possible future steps. Now, I could define a record that only contains relevant data per each request. But, as seen above, I'd be eventually forced to make ~ 2<sup>NRelations</sup> of such records, instead of one. A reusable one, that serves like a bucket for a preferred persistence mechanism, allowing me to load relations later on, because nothing lives in memory long term.
</p>
<p>
I strayed away slightly here from ORM vs no-ORM discussion that I've started earlier. Because, now I realize that this problem isn't just about mapping things from type <code>A</code> to type <code>B</code>.
</p>
</div>
<div class="comment-date">2023-10-08 23:24 UTC</div>
</div>
<div class="comment" id="f8dc3a0d9ca44cbc88de5b773f4679d0">
<div class="comment-author">opcoder <a href="#f8dc3a0d9ca44cbc88de5b773f4679d0">#</a></div>
<div class="comment-content">
I wonder if EF not having all the features we want isn't a false problem. I feel like we try to use the domain entities as DTOs and viceversa, breaking the SRP principle.
But if we start writing DTOs and use them with EF, we would need a layer to map between the DTOs and the entities (AutoMapper might help with this?).
I'm sure this has been discussed before.
</div>
<div class="comment-date">2023-10-09 6:56 UTC</div>
</div>
<div class="comment" id="9af5c8eda58f44fcaa24c22515286be8">
<div class="comment-author">qfilip <a href="#9af5c8eda58f44fcaa24c22515286be8">#</a></div>
<div class="comment-content">
<a href="#f8dc3a0d9ca44cbc88de5b773f4679d0">opcoder</a> not really, no... Automapper(s) should only be used for mapping between two "dumb objects" (DTOs). I wouldn't drag in a library even for that, however, as it's relatively simple to write this thing yourself (with tests) and have full control / zero configuration when you come to a point to create some special projections. As for storing domain models in objects, proper OOP objects, with both data and behaviour, I don't like that either. Single reason for that is: constructors. This is where you pass the data to be validated into a domain model, and this is where OOP has a fatal flaw for me. Constructors can only throw exceptions, giving me no room to maneuver. You can use static methods with <code>ValidationResult<T></code> as a return type, but now we're entering a territory where C#, as a language, is totally unprepared for.
</div>
<div class="comment-date"><time>2023-10-13 17:00 UTC</time></div>
</div>
<div class="comment" id="84a111412d674526b05226f903e81af3">
<div class="comment-author">Iker <a href="#84a111412d674526b05226f903e81af3">#</a></div>
<div class="comment-content">
<p>Just my two cents:</p>
<p>Yes, it is possible to map the <code>NaturalNumber</code> object to an E.F class property using <a href="https://learn.microsoft.com/en-us/ef/core/modeling/value-conversions">ValueConverters</a>. Here are a couple of articles talking about this:</p>
<ul>
<li><a href="https://andrewlock.net/strongly-typed-ids-in-ef-core-using-strongly-typed-entity-ids-to-avoid-primitive-obsession-part-4/">Andrew Lock: Using strongly-typed entity IDs to avoid primitive obsession.</a></li>
<li><a href="https://thomaslevesque.com/2020/12/23/csharp-9-records-as-strongly-typed-ids-part-4-entity-framework-core-integration/">Thomas Levesque: Using C# 9 records as strongly-typed ids.</a></li>
</ul>
<p>But even though you can use this, you may still encounter another use cases that you cannot tackle. E.F is just a tool with its limitations, and there will be things you can do with simple C# that you can not do with E.F.</p>
<p>I think you need to consider why you want to use E.F, understand its strengths and weaknesses, and then decide if it suits your project.</p>
<p>Do you want to use EF solely as a data access layer, or do you want it to be your domain layer?. Maybe for a big project you can use only E.F as a data access layer and use old plain C# files for domain layer. In a [small | medium | quick & dirty] project use as your domain layer.</p>
<p>There are bad thing we already know:</p>
<ul>
<li>Increased complexity.</li>
<li>There will be things you can not do. So you must be carefull you will not need something E.F can not give you.</li>
<li>You need to know how it works. For example, know that accessing <code>myRestaurant.MaitreD</code> implies a new database access (if you have not loaded it previously).</li>
</ul>
<p>But sometimes E.F shines, for example:</p>
<ul>
<li>You are programing against the E.F model, not against a specific database, so it is easier to migrate to another database.</li>
<li>Maybe migrate to another database is rare, but it is very convenient to run tests against an in-memory SQLite database. Tests against a real database can be run in the CD/CI environment, for example.</li>
<li>Having a centralized point to process changes (<code>SaveChanges</code>) allows you to easily do interesting things: save "CreatedBy," "CreatedDate," "ModifiedBy," and "ModifiedDate" fields for all tables or create historical tables (if you do not have access to the<a href="https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver16"> SQL Server temporal tables</a>).</li>
<li>Global query filters allow you to make your application multi-tenant with very little code: all tables implement <code>IByClient</code>, a global filter for theses tables... and voilà , your application becomes multi-client with just a few lines.</li>
</ul>
<p>I am not a E.F defender, in fact I have a love & hate reletaionship with it. But I believe it is a powerful tool for certain projects. As always, the important thing is to think whether it is the right tool for your specific project :)</p>
</div>
<div class="comment-date">2023-10-15 16:43 UTC</div>
</div>
<div class="comment" id="0c76c456b47e42ec872603996ba1cfc0">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#0c76c456b47e42ec872603996ba1cfc0">#</a></div>
<div class="comment-content">
<p>
Thank you, all, for writing. There's more content in your comments than I can address in one piece, but I've written a follow-up article that engages with some of your points: <a href="/2023/10/23/domain-model-first">Domain Model first</a>.
</p>
<p>
Specifically regarding the point of having to hand-write a lot of code to deal with multiple tables joined in various fashions, I grant that while <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing isn't a bottleneck</a>, the more code you add, the greater the risk of bugs. I'm not trying to be dismissive of ORMs as a general tool. If you truly, inescapably, have a relational model, then an ORM seems like a good choice. If so, however, I don't see that you can get good encapsulation at the same time.
</p>
<p>
And indeed, an important responsibility of a software architect is to consider trade-offs to find a good solution for a particular problem. Sometimes such a solution involves an ORM, but sometimes, it doesn't. In my world, it usually doesn't.
</p>
<p>
Do I breathe rarefied air, dealing with esoteric problems that mere mortals can never hope to encounter? I don't think so. Rather, I offer the interpretation that I sometimes approach problems in a different way. All I really try to do with these articles is to present to the public the ways I think about problems. I hope, then, that it may inspire other people to consider problems from more than one angle.
</p>
<p>
Finally, from my various customer engagements I get the impression that people also like ORMs because 'entity classes' look strongly typed. As a counter-argument, I suggest that <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">this may be an illusion</a>.
</p>
</div>
<div class="comment-date">2023-10-23 06:45 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
A first stab at the Brainfuck kata
https://blog.ploeh.dk/2023/09/11/a-first-stab-at-the-brainfuck-kata
2023-09-11T08:07:00+00:00
Mark Seemann
<div id="post">
<p>
<em>I almost gave up, but persevered and managed to produce something that works.</em>
</p>
<p>
As I've <a href="/2023/08/28/a-first-crack-at-the-args-kata">previously mentioned</a>, a customer hired me to swing by to demonstrate test-driven development and <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">tactical Git</a>. To make things interesting, we agreed that they'd give me a <a href="https://en.wikipedia.org/wiki/Kata_(programming)">kata</a> at the beginning of the session. I didn't know which problem they'd give me, so I thought it'd be a good idea to come prepared. I decided to seek out katas that I hadn't done before.
</p>
<p>
The demonstration session was supposed to be two hours in front of a participating audience. In order to make my preparation aligned to that situation, I decided to impose a two-hour time limit to see how far I could get. At the same time, I'd also keep an eye on didactics, so preferably proceeding in an order that would be explainable to an audience.
</p>
<p>
Some katas are more complicated than others, so I'm under no illusion that I can complete any, to me unknown, kata in under two hours. My success criterion for the time limit is that I'd like to reach a point that would satisfy an audience. Even if, after two hours, I don't reach a complete solution, I should leave a creative and intelligent audience with a good idea of how to proceed.
</p>
<p>
After a few other katas, I ran into the <a href="https://codingdojo.org/kata/Brainfuck/">Brainfuck</a> kata one Thursday. In this article, I'll describe some of the most interesting things that happened along the way. If you want all the details, the code is <a href="https://github.com/ploeh/BrainfuckCSharp">available on GitHub</a>.
</p>
<h3 id="24034bdadc7c4f9798ada181f99cd46a">
Understanding the problem <a href="#24034bdadc7c4f9798ada181f99cd46a">#</a>
</h3>
<p>
I had heard about <a href="https://en.wikipedia.org/wiki/Brainfuck">Brainfuck</a> before, but never tried to write an interpreter (or a program, for that matter).
</p>
<p>
The <a href="https://codingdojo.org/kata/Brainfuck/">kata description</a> lacks examples, so I decided to search for them elsewhere. The <a href="https://en.wikipedia.org/wiki/Brainfuck">wikipedia article</a> comes with some examples of small programs (including <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program">Hello, World</a>), so ultimately I used that for reference instead of the kata page.
</p>
<p>
I'm happy I wasn't making my first pass on this problem in front of an audience. I spent the first 45 minutes just trying to understand the examples.
</p>
<p>
You might find me slow, since the rules of the language aren't that complicated. I was, however, confused by the way the examples were presented.
</p>
<p>
As the wikipedia article explains, in order to add two numbers together, one can use this idiom:
</p>
<p>
<pre>[->+<]</pre>
</p>
<p>
The article then proceeds to list a small, complete program that adds two numbers. This program adds numbers this way:
</p>
<p>
<pre>[ Start your loops with your cell pointer on the loop counter (c1 in our case)
< + Add 1 to c0
> - Subtract 1 from c1
] End your loops with the cell pointer on the loop counter</pre>
</p>
<p>
I couldn't understand why this annotated 'walkthrough' explained the idiom in reverse. Several times, I was on the verge of giving up, feeling that I made absolutely no progress. Finally, it dawned on me that the second example is <em>not</em> an explanation of the first example, but rather a separate example that makes use of the same idea, but expresses it in a different way.
</p>
<p>
Most programming languages have more than one way to do things, and this is also the case here. <code>[->+<]</code> adds two numbers together, but so does <code>[<+>-]</code>.
</p>
<p>
Once you understand something, it can be difficult to recall why you ever found it confusing. Now that I get this, I'm having trouble explaining what I was originally thinking, and why it confused me.
</p>
<p>
This experience does, however, drive home a point for educators: When you introduce a concept and then provide examples, the first example should be a direct continuation of the introduction, and not some variation. Variations are fine, too, but should follow later and be clearly labelled.
</p>
<p>
After 45 minutes I had finally cracked the code and was ready to get programming.
</p>
<h3 id="4903839fcc6a4d75917b97026359cdc6">
Getting started <a href="#4903839fcc6a4d75917b97026359cdc6">#</a>
</h3>
<p>
The <a href="https://codingdojo.org/kata/Brainfuck/">kata description</a> suggests starting with the <code>+</code>, <code>-</code>, <code>></code>, and <code><</code> instructions to manage memory. I briefly considered that, but on the other hand, I wanted to have some test coverage. Usually, I take advantage of test-driven development, and I while I wasn't sure how to proceed, I wanted to have some tests.
</p>
<p>
If I were to exclusively start with memory management, I would need some way to inspect the memory in order to write assertions. This struck me as violating <a href="/encapsulation-and-solid">encapsulation</a>.
</p>
<p>
Instead, I thought that I'd write the simplest program that would produce some output, because if I had output, I would have something to verify.
</p>
<p>
That, on the other hand, meant that I had to consider how to model input and output. The Wikipedia article describes these as
</p>
<blockquote>
<p>
"two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII character encoding)."
</p>
<footer><cite><a href="https://en.wikipedia.org/wiki/Brainfuck">Wikipedia</a></cite></footer>
</blockquote>
<p>
Knowing that <a href="https://learn.microsoft.com/archive/blogs/ploeh/console-unit-testing">you can model the console's input and output streams as polymorphic objects</a>, I decided to model the output as a <a href="https://learn.microsoft.com/dotnet/api/system.io.textwriter">TextWriter</a>. The lowest-valued printable <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> character is space, which has the byte value <code>32</code>, so I wrote this test:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments; ASCII 32 is space</span>
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(output);
sut.Run(program);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
As you can see, I wrote the test as a <code>[Theory]</code> (parametrised test) from the outset, since I predicted that I'd add more test cases soon. Strictly speaking, when following the <a href="/2019/10/21/a-red-green-refactor-checklist">red-green-refactor checklist</a>, you shouldn't write more code than absolutely necessary. According to <a href="https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">YAGNI</a>, you should avoid <a href="https://wiki.c2.com/?SpeculativeGenerality">speculative generality</a>.
</p>
<p>
Sometimes, however, you've gone through a process so many times that you know, with near certainty, what happens next. I've done test-driven development for decades, so I occasionally allow my experience to trump the rules.
</p>
<p>
The Brainfuck program in the <code>[InlineData]</code> attribute increments the same data cell 32 times (you can count the plusses) and then outputs its value. The <code>expected</code> output is the space character, since it has the ASCII code <code>32</code>.
</p>
<p>
What's the simplest thing that could possibly work? Something like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>(StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>)
{
output.Write(<span style="color:#a31515;">' '</span>);
}
}</pre>
</p>
<p>
As is typical with test-driven development (TDD), the first few tests help you design the API, but not the implementation, which, here, is deliberately naive.
</p>
<p>
Since I felt pressed for time, having already spent 45 minutes of my two-hour time limit getting to grips with the problem, I suppose I lingered less on the <em>refactoring</em> phase than perhaps I should have. You'll notice, at least, that the <code>BrainfuckInterpreter</code> class depends on <a href="https://learn.microsoft.com/dotnet/api/system.io.stringwriter">StringWriter</a> rather than its abstract parent class <code>TextWriter</code>, which was the original plan.
</p>
<p>
It's not a disastrous mistake, so when I later discovered it, I easily rectified it.
</p>
<h3 id="301838c8419640358d52babb6d8e04b8">
Implementation outline <a href="#301838c8419640358d52babb6d8e04b8">#</a>
</h3>
<p>
To move on, I added another test case:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments; ASCII 32 is space</span>
[InlineData(<span style="color:#a31515;">"+++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">"!"</span>)] <span style="color:green;">// 33 increments; ASCII 32 is !</span>
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(output);
sut.Run(program);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The only change is the addition of the second <code>[InlineData]</code> attribute, which supplies a slightly different Brainfuck program. This one has 33 increments, which corresponds to the ASCII character code for an exclamation mark.
</p>
<p>
Notice that I clearly copied and pasted the comment, but forgot to change the last <code>32</code> to <code>33</code>.
</p>
<p>
In my eagerness to pass both tests, and because I felt the clock ticking, I made another classic TDD mistake: I took too big a step. At this point, it would have been enough to iterate over the program's characters, count the number of plusses, and convert that number to a character. What I did instead was this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>(StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">imp</span> = <span style="color:blue;">new</span> InterpreterImp(program, output);
imp.Run();
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InterpreterImp</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">int</span> programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">byte</span>[] data;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> program;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">InterpreterImp</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
data = <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[30_000];
<span style="color:blue;">this</span>.program = program;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">internal</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>()
{
<span style="font-weight:bold;color:#8f08c4;">while</span> (!IsDone)
InterpretInstruction();
}
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> IsDone => program.Length <= programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InterpretInstruction</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">instruction</span> = program[programPointer];
<span style="font-weight:bold;color:#8f08c4;">switch</span> (instruction)
{
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'+'</span>:
data[0]++;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'.'</span>:
output.Write((<span style="color:blue;">char</span>)data[0]);
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">default</span>:
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
}
}
}</pre>
</p>
<p>
With only two test cases, all that code isn't warranted, but I was more focused on implementing an interpreter than on moving in small steps. Even with decades of TDD experience, discipline sometimes slips. Or maybe exactly because of it.
</p>
<p>
Once again, I was fortunate enough that this implementation structure turned out to work all the way, but the point of the TDD process is that you can't always know that.
</p>
<p>
You may wonder why I decided to delegate the work to an inner class. I did that because I expected to have to maintain a <code>programPointer</code> over the actual <code>program</code>, and having a class that interprets <em>one</em> program has better encapsulation. I'll remind the reader than when I use the word <em>encapsulation</em>, I don't necessarily mean <em>information hiding</em>. Usually, I think in terms of <em>contracts</em>: Invariants, pre-, and postconditions.
</p>
<p>
With this design, the <code>program</code> is guaranteed to be present as a class field, since it's <code>readonly</code> and assigned upon initialisation. No <a href="/2013/07/08/defensive-coding">defensive coding</a> is required.
</p>
<h3 id="630304784a59460e836bd9d773e56b60">
Remaining memory-management instructions <a href="#630304784a59460e836bd9d773e56b60">#</a>
</h3>
<p>
While I wasn't planning on making use of the <a href="/2019/10/07/devils-advocate">Devil's advocate</a> technique, I did leave one little deliberate mistake in the above implementation: I'd hardcoded the data pointer as <code>0</code>.
</p>
<p>
This made it easy to choose the next test case, and the next one after that, and so on.
</p>
<p>
At the two-hour mark, I had these test cases:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments; ASCII 32 is space</span>
[InlineData(<span style="color:#a31515;">"+++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">"!"</span>)] <span style="color:green;">// 33 increments; ASCII 32 is !</span>
[InlineData(<span style="color:#a31515;">"+>++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments after >; ASCII 32 is space</span>
[InlineData(<span style="color:#a31515;">"+++++++++++++++++++++++++++++++++-."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 33 increments and 1 decrement; ASCII 32</span>
[InlineData(<span style="color:#a31515;">">+<++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments after movement; ASCII 32</span>
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(output);
sut.Run(program);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
And this implementation:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InterpreterImp</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">int</span> programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">int</span> dataPointer;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">byte</span>[] data;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> program;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">InterpreterImp</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
data = <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[30_000];
<span style="color:blue;">this</span>.program = program;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">internal</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>()
{
<span style="font-weight:bold;color:#8f08c4;">while</span> (!IsDone)
InterpretInstruction();
}
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> IsDone => program.Length <= programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InterpretInstruction</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">instruction</span> = program[programPointer];
<span style="font-weight:bold;color:#8f08c4;">switch</span> (instruction)
{
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'>'</span>:
dataPointer++;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'<'</span>:
dataPointer--;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'+'</span>:
data[dataPointer]++;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'-'</span>:
data[dataPointer]--;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'.'</span>:
output.Write((<span style="color:blue;">char</span>)data[dataPointer]);
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">default</span>:
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
}
}
}</pre>
</p>
<p>
I'm only showing the inner <code>InterpreterImp</code> class, since I didn't change the outer <code>BrainfuckInterpreter</code> class.
</p>
<p>
At this point, I had used my two hours, but I think that I managed to leave my imaginary audience with a sketch of a possible solution.
</p>
<h3 id="ff7d802a-14bb-4c1d-bb39-1ecaa9059f03">
Jumps <a href="#ff7d802a-14bb-4c1d-bb39-1ecaa9059f03">#</a>
</h3>
<p>
What remained was the jumping instructions <code>[</code> and <code>]</code>, as well as input.
</p>
<p>
Perhaps I could have kept adding small <code>[InlineData]</code> test cases to my single test method, but I thought I was ready to take on some of the small example programs on the Wikipedia page. I started with the addition example in this manner:
</p>
<p>
<pre> <span style="color:green;">// Copied from https://en.wikipedia.org/wiki/Brainfuck</span>
<span style="color:blue;">const</span> <span style="color:blue;">string</span> addTwoProgram = <span style="color:maroon;">@"
++ Cell c0 = 2
> +++++ Cell c1 = 5
[ Start your loops with your cell pointer on the loop counter (c1 in our case)
< + Add 1 to c0
> - Subtract 1 from c1
] End your loops with the cell pointer on the loop counter
At this point our program has added 5 to 2 leaving 7 in c0 and 0 in c1
but we cannot output this value to the terminal since it is not ASCII encoded
To display the ASCII character ""7"" we must add 48 to the value 7
We use a loop to compute 48 = 6 * 8
++++ ++++ c1 = 8 and this will be our loop counter again
[
< +++ +++ Add 6 to c0
> - Subtract 1 from c1
]
< . Print out c0 which has the value 55 which translates to ""7""!"</span>;
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AddTwoValues</span>()
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">input</span> = <span style="color:blue;">new</span> StringReader(<span style="color:#a31515;">""</span>);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(input, output);
sut.Run(addTwoProgram);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(<span style="color:#a31515;">"7"</span>, actual);
}</pre>
</p>
<p>
I got that test passing, added the next example, got that passing, and so on. My final implementation looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextReader input;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextWriter output;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>(TextReader <span style="font-weight:bold;color:#1f377f;">input</span>, TextWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
<span style="color:blue;">this</span>.input = input;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">imp</span> = <span style="color:blue;">new</span> InterpreterImp(program, input, output);
imp.Run();
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InterpreterImp</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">int</span> instructionPointer;
<span style="color:blue;">private</span> <span style="color:blue;">int</span> dataPointer;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">byte</span>[] data;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> program;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextReader input;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextWriter output;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">InterpreterImp</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, TextReader <span style="font-weight:bold;color:#1f377f;">input</span>, TextWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
data = <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[30_000];
<span style="color:blue;">this</span>.program = program;
<span style="color:blue;">this</span>.input = input;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">internal</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>()
{
<span style="font-weight:bold;color:#8f08c4;">while</span> (!IsDone)
InterpretInstruction();
}
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> IsDone => program.Length <= instructionPointer;
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InterpretInstruction</span>()
{
WrapDataPointer();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">instruction</span> = program[instructionPointer];
<span style="font-weight:bold;color:#8f08c4;">switch</span> (instruction)
{
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'>'</span>:
dataPointer++;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'<'</span>:
dataPointer--;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'+'</span>:
data[dataPointer]++;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'-'</span>:
data[dataPointer]--;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'.'</span>:
output.Write((<span style="color:blue;">char</span>)data[dataPointer]);
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">','</span>:
data[dataPointer] = (<span style="color:blue;">byte</span>)input.Read();
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'['</span>:
<span style="font-weight:bold;color:#8f08c4;">if</span> (data[dataPointer] == 0)
MoveToMatchingClose();
<span style="font-weight:bold;color:#8f08c4;">else</span>
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">']'</span>:
<span style="font-weight:bold;color:#8f08c4;">if</span> (data[dataPointer] != 0)
MoveToMatchingOpen();
<span style="font-weight:bold;color:#8f08c4;">else</span>
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">default</span>:
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
}
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">WrapDataPointer</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (dataPointer == -1)
dataPointer = data.Length - 1;
<span style="font-weight:bold;color:#8f08c4;">if</span> (dataPointer == data.Length)
dataPointer = 0;
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">MoveToMatchingClose</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">nestingLevel</span> = 1;
<span style="font-weight:bold;color:#8f08c4;">while</span> (0 < nestingLevel)
{
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">'['</span>)
nestingLevel++;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">']'</span>)
nestingLevel--;
}
instructionPointer++;
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">MoveToMatchingOpen</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">nestingLevel</span> = 1;
<span style="font-weight:bold;color:#8f08c4;">while</span> (0 < nestingLevel)
{
instructionPointer--;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">']'</span>)
nestingLevel++;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">'['</span>)
nestingLevel--;
}
instructionPointer++;
}
}
}</pre>
</p>
<p>
As you can see, I finally discovered that I'd been too concrete when using <code>StringWriter</code>. Now, <code>input</code> is defined as a <a href="https://learn.microsoft.com/dotnet/api/system.io.textreader">TextReader</a>, and <code>output</code> as a <code>TextWriter</code>.
</p>
<p>
When <a href="https://learn.microsoft.com/dotnet/api/system.io.textreader.read">TextReader.Read</a> encounters the end of the input stream, it returns <code>-1</code>, and when you cast that to <code>byte</code>, it becomes <code>255</code>. I admit that I haven't read through the Wikipedia article's <em>ROT13</em> example code to a degree that I understand how it decides to stop processing, but the test passes.
</p>
<p>
I also realised that the Wikipedia article used the term <em>instruction pointer</em>, so I renamed <code>programPointer</code> to <code>instructionPointer</code>.
</p>
<h3 id="868485959d2a4adfbdaa5e34de5c0484">
Assessment <a href="#868485959d2a4adfbdaa5e34de5c0484">#</a>
</h3>
<p>
Due to the <code>switch/case</code> structure, the <code>InterpretInstruction</code> method has a <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of <em>12</em>, which is more than I recommend in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
It's not uncommon that <code>switch/case</code> code has high cyclomatic complexity, and this is also a common criticism of the measure. When each <code>case</code> block is as simple as it is here, or delegates to helper methods such as <code>MoveToMatchingClose</code>, you could reasonably argue that the code is still maintainable.
</p>
<p>
<a href="/ref/refactoring">Refactoring</a> lists switch statements as a code smell and suggests better alternatives. Had I followed the kata description's <em>additional constraints</em> to the letter, I should also have made it easy to add new instructions, or rename existing ones. This might suggest that one of <a href="https://martinfowler.com/">Martin Fowler</a>'s refactorings might be in order.
</p>
<p>
That is, however, an entirely different kind of exercise, and I thought that I'd already gotten what I wanted out of the kata.
</p>
<h3 id="9ed50e79927541a18a53a37d3c810442">
Conclusion <a href="#9ed50e79927541a18a53a37d3c810442">#</a>
</h3>
<p>
At first glance, the Brainfuck language isn't difficult to understand (but onerous to read). Even so, it took me so long time to understand the example code that I almost gave up more than once. Still, once I understood how it worked, the interpreter actually wasn't that hard to write.
</p>
<p>
In retrospect, perhaps I should have structured my code differently. Perhaps I should have used polymorphism instead of a switch statement. Perhaps I should have written the code in a more functional style. Regular readers will at least recognise that the code shown here is uncharacteristically imperative for me. I do, however, try to vary my approach to fit the problem at hand (<em>use the right tool for the job</em>, as the old saw goes), and the Brainfuck language is described in so imperative terms that imperative code seemed like the most fitting style.
</p>
<p>
Now that I understand how Brainfuck works, I might later try to <a href="/2020/01/13/on-doing-katas">do the kata with some other constraints</a>. It might prove interesting.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Decomposing CTFiYH's sample code base
https://blog.ploeh.dk/2023/09/04/decomposing-ctfiyhs-sample-code-base
2023-09-04T06:00:00+00:00
Mark Seemann
<div id="post">
<p>
<em>An experience report.</em>
</p>
<p>
In my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> (CTFiYH) I write in the last chapter:
</p>
<blockquote>
<p>
If you've looked at the book's sample code base, you may have noticed that it looks disconcertingly monolithic. If you consider the full code base that includes the integration tests, as [the following figure] illustrates, there are all of three packages[...]. Of those, only one is production code.
</p>
<p>
<img src="/content/binary/ctfiyh-monolith-architecture.png" alt="Three boxes labelled unit tests, integration tests, and REST API.">
</p>
<p>
[Figure caption:] The packages that make up the sample code base. With only a single production package, it reeks of a monolith.
</p>
<footer><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 16.2.1.</footer>
</blockquote>
<p>
Later, after discussing dependency cycles, I conclude:
</p>
<blockquote>
<p>
I've been writing F# and Haskell for enough years that I naturally follow the beneficial rules that they enforce. I'm confident that the sample code is nicely decoupled, even though it's packaged as a monolith. But unless you have a similar experience, I recommend that you separate your code base into multiple packages.
</p>
<footer><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 16.2.2.</footer>
</blockquote>
<p>
Usually, you can't write something that cocksure without it backfiring, but I was really, really confident that this was the case. Still, it's always nagged me, because I believe that I should walk the walk rather than talk the talk. And I do admit that this was one of the few claims in the book that I actually didn't have code to back up.
</p>
<p>
So I decided to spend part of a weekend to verify that what I wrote was true. You won't believe what happened next.
</p>
<h3 id="850effbb523248579dd4c382ed75f923">
Decomposition <a href="#850effbb523248579dd4c382ed75f923">#</a>
</h3>
<p>
Reader, I was right all along.
</p>
<p>
I stopped my experiment when my package graph looked like this:
</p>
<p>
<img src="/content/binary/ctfiyh-decomposed-architecture.png" alt="Ports-and-adapters architecture diagram.">
</p>
<p>
Does that look familiar? It should; it's a poster example of <a href="/2013/12/03/layers-onions-ports-adapters-its-all-the-same">Ports and Adapters</a>, or, if you will, <a href="/ref/clean-architecture">Clean Architecture</a>. Notice how all dependencies point inward, following the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>.
</p>
<p>
The Domain Model has no dependencies, while the HTTP Model (Application Layer) only depends on the Domain Model. The outer layer contains the ports and adapters, as well as the <a href="/2011/07/28/CompositionRoot">Composition Root</a>. The Web Host is a small web project that composes everything else. In order to do that, it must reference everything else, either directly (SMTP, SQL, HTTP Model) or transitively (Domain Model).
</p>
<p>
The <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapters</a>, on the other hand, depend on the HTTP Model and (not shown) the SDKs that they adapt. The <code>SqlReservationsRepository</code> class, for example, is implemented in the SQL library, adapting the <code>System.Data.SqlClient</code> SDK to look like an <code>IReservationsRepository</code>, which is defined in the HTTP Model.
</p>
<p>
The SMTP library is similar. It contains a concrete implementation called <code>SmtpPostOffice</code> that adapts the <code>System.Net.Mail</code> API to look like an <code>IPostOffice</code> object. Once again, the <code>IPostOffice</code> interface is defined in the HTTP Model.
</p>
<p>
The above figure is not to scale. In reality, the outer ring is quite thin. The SQL library contains only <code>SqlReservationsRepository</code> and some supporting text files with SQL <a href="https://en.wikipedia.org/wiki/Data_definition_language">DDL</a> definitions. The SMTP library contains only the <code>SmtpPostOffice</code> class. And the Web Host contains <code>Program</code>, <code>Startup</code>, and a few configuration file <a href="https://en.wikipedia.org/wiki/Data_transfer_object">DTOs</a> (<em>options</em>, in ASP.NET parlance).
</p>
<h3 id="813610f5bf6446cfb1daaa22423df14c">
Application layer <a href="#813610f5bf6446cfb1daaa22423df14c">#</a>
</h3>
<p>
The majority of code, at least if I use a rough proxy measure like number of files, is in the HTTP Model. I often think of this as the <em>application layer</em>, because it's all the logic that's specific to to the application, in contrast to the Domain Model, which ought to contain code that can be used in a variety of application contexts (REST API, web site, batch job, etc.).
</p>
<p>
In this particular, case the application is a REST API, and it turns out that while the Domain Model isn't trivial, more goes on making sure that the REST API behaves correctly: That it returns correctly formatted data, that it <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">validates input</a>, that it <a href="/2020/11/09/checking-signed-urls-with-aspnet">detects attempts at tampering with URLs</a>, that it <a href="/2020/11/16/redirect-legacy-urls">redirects legacy URLs</a>, etc.
</p>
<p>
This layer also contains the <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">interfaces for the application's real dependencies</a>: <code>IReservationsRepository</code>, <code>IPostOffice</code>, <code>IRestaurantDatabase</code>, and <code>IClock</code>. This explains why the SQL and SMTP packages need to reference the HTTP Model.
</p>
<p>
If you have bought the book, you have access to its example code base, and while it's a Git repository, this particular experiment isn't included. After all, I just made it, two years after finishing the book. Thus, if you want to compare with the code that comes with the book, here's a list of all the files I moved to the HTTP Model package:
</p>
<ul>
<li>AccessControlList.cs</li>
<li>CalendarController.cs</li>
<li>CalendarDto.cs</li>
<li>Day.cs</li>
<li>DayDto.cs</li>
<li>DtoConversions.cs</li>
<li>EmailingReservationsRepository.cs</li>
<li>Grandfather.cs</li>
<li>HomeController.cs</li>
<li>HomeDto.cs</li>
<li>Hypertext.cs</li>
<li>IClock.cs</li>
<li>InMemoryRestaurantDatabase.cs</li>
<li>IPeriod.cs</li>
<li>IPeriodVisitor.cs</li>
<li>IPostOffice.cs</li>
<li>IReservationsRepository.cs</li>
<li>IRestaurantDatabase.cs</li>
<li>Iso8601.cs</li>
<li>LinkDto.cs</li>
<li>LinksFilter.cs</li>
<li>LoggingClock.cs</li>
<li>LoggingPostOffice.cs</li>
<li>LoggingReservationsRepository.cs</li>
<li>Month.cs</li>
<li>NullPostOffice.cs</li>
<li>Period.cs</li>
<li>ReservationDto.cs</li>
<li>ReservationsController.cs</li>
<li>ReservationsRepository.cs</li>
<li>RestaurantDto.cs</li>
<li>RestaurantsController.cs</li>
<li>ScheduleController.cs</li>
<li>SigningUrlHelper.cs</li>
<li>SigningUrlHelperFactory.cs</li>
<li>SystemClock.cs</li>
<li>TimeDto.cs</li>
<li>UrlBuilder.cs</li>
<li>UrlIntegrityFilter.cs</li>
<li>Year.cs</li>
</ul>
<p>
As you can see, this package contains the Controllers, the DTOs, the interfaces, and some REST- and HTTP-specific code such as <a href="/2020/11/02/signing-urls-with-aspnet">SigningUrlHelper</a>, <a href="/2020/11/09/checking-signed-urls-with-aspnet">UrlIntegrityFilter</a>, <a href="/2020/08/24/adding-rest-links-as-a-cross-cutting-concern">LinksFilter</a>, security, <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO 8601</a> formatters, etc.
</p>
<h3 id="eace4360f69240c0b8077a3c1e236473">
Domain Model <a href="#eace4360f69240c0b8077a3c1e236473">#</a>
</h3>
<p>
The Domain Model is small, but not insignificant. Perhaps the most striking quality of it is that (with a single, inconsequential exception) it contains no interfaces. There are no polymorphic types that model application dependencies such as databases, web services, messaging systems, or the system clock. Those are all the purview of the application layer.
</p>
<p>
As the book describes, the architecture is <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">functional core, imperative shell</a>, and since <a href="/2017/01/27/from-dependency-injection-to-dependency-rejection">dependencies make everything impure</a>, you can't have those in your functional core. While, with a language like C#, <a href="/2020/02/24/discerning-and-maintaining-purity">you can never be sure that a function truly is pure</a>, I believe that the entire Domain Model is <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a>.
</p>
<p>
For those readers who have the book's sample code base, here's a list of the files I moved to the Domain Model:
</p>
<ul>
<li>Email.cs</li>
<li>ITableVisitor.cs</li>
<li>MaitreD.cs</li>
<li>Name.cs</li>
<li>Reservation.cs</li>
<li>ReservationsVisitor.cs</li>
<li>Restaurant.cs</li>
<li>Seating.cs</li>
<li>Table.cs</li>
<li>TimeOfDay.cs</li>
<li>TimeSlot.cs</li>
</ul>
<p>
If the entire Domain Model consists of immutable values and <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>, and if impure dependencies make everything impure, what's the <code>ITableVisitor</code> interface doing there?
</p>
<p>
This interface doesn't model any external application dependency, but rather represents <a href="/2018/06/25/visitor-as-a-sum-type">a sum type with the Visitor pattern</a>. The interface looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">ITableVisitor</span><<span style="color:#2b91af;">T</span>>
{
T <span style="font-weight:bold;color:#74531f;">VisitStandard</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>, Reservation? <span style="font-weight:bold;color:#1f377f;">reservation</span>);
T <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>, IReadOnlyCollection<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>);
}</pre>
</p>
<p>
Restaurant tables are modelled this way because the Domain Model distinguishes between two fundamentally different kinds of tables: Normal restaurant tables, and communal or shared tables. In <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a> such a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> would be a one-liner, but in C# you need to use either the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor pattern</a> or a <a href="/2018/05/22/church-encoding">Church encoding</a>. For the book, I chose the Visitor pattern in order to keep the code base as object-oriented as possible.
</p>
<h3 id="c76f92f5413246958cfd31a6edc44845">
Circular dependencies <a href="#c76f92f5413246958cfd31a6edc44845">#</a>
</h3>
<p>
In the book I wrote:
</p>
<blockquote>
<p>
The passive prevention of cycles [that comes from separate packages] is worth the extra complexity. Unless team members have extensive experience with a language that prevents cycles, I recommend this style of architecture.
</p>
<p>
Such languages do exist, though. F# famously prevents cycles. In it, you can't use a piece of code unless it's already defined above. Newcomers to the language see this as a terrible flaw, but it's actually one of its <a href="https://fsharpforfunandprofit.com/posts/cycles-and-modularity-in-the-wild/">best</a> <a href="http://evelinag.com/blog/2014/06-09-comparing-dependency-networks/">features</a>.
</p>
<p>
Haskell takes a different approach, but ultimately, its explicit treatment of side effects at the type level <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">steers you towards a ports-and-adapters-style architecture</a>. Your code simply doesn't compile otherwise!
</p>
<p>
I've been writing F# and Haskell for enough years that I naturally follow the beneficial rules that they enforce. I'm confident that the sample code is nicely decoupled, even though it's packaged as a monolith. But unless you have a similar experience, I recommend that you separate your code base into multiple packages.
</p>
<footer><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 16.2.2.</footer>
</blockquote>
<p>
As so far demonstrated in this article, I'm now sufficiently conditioned to be aware of side effects and non-determinism that I know to avoid them and push them to be boundaries of the application. Even so, it turns out that it's insidiously easy to introduce small cycles when the language doesn't stop you.
</p>
<p>
This wasn't much of a problem in the Domain Model, but one small example may still illustrate how easy it is to let your guard down. In the Domain Model, I'd added a class called <code>TimeOfDay</code> (since this code base predates <a href="https://learn.microsoft.com/dotnet/api/system.timeonly">TimeOnly</a>), but without thinking much of it, I'd added this method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">ToIso8601TimeString</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> durationSinceMidnight.ToIso8601TimeString();
}</pre>
</p>
<p>
While this doesn't look like much, formatting a time value as an ISO 8601 value isn't a Domain concern. It's an application boundary concern, so belongs in the HTTP Model. And sure enough, once I moved the file that contained the ISO 8601 conversions to the HTTP Model, the <code>TimeOfDay</code> class no longer compiled.
</p>
<p>
In this case, the fix was easy. I removed the method from the <code>TimeOfDay</code> class, but added an extension method to the other ISO 8601 conversions:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">ToIso8601TimeString</span>(<span style="color:blue;">this</span> TimeOfDay <span style="font-weight:bold;color:#1f377f;">timeOfDay</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> ((TimeSpan)timeOfDay).ToIso8601TimeString();
}</pre>
</p>
<p>
While I had little trouble moving the files to the Domain Model one-by-one, once I started moving files to the HTTP Model, it turned out that this part of the code base contained more coupling.
</p>
<p>
Since I had made many classes and interfaces <a href="/2021/03/01/pendulum-swing-internal-by-default">internal by default</a>, once I started moving types to the HTTP Model, I was faced with either making them public, or move them en block. Ultimately, I decided to move eighteen files that were transitively linked to each other in one go. I could have moved them in smaller chunks, but that would have made the <code>internal</code> types invisible to the code that (temporarily) stayed behind. I decided to move them all at once. After all, while I prefer to <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">move in small, controlled steps</a>, even moving eighteen files isn't that big an operation.
</p>
<p>
In the end, I still had to make <code>LinksFilter</code>, <code>UrlIntegrityFilter</code>, <code>SigningUrlHelperFactory</code>, and <code>AccessControlList.FromUser</code> public, because I needed to reference them from the Web Host.
</p>
<h3 id="f3da2544f60b4e7f8abc441e3ddaaa7f">
Test dependencies <a href="#f3da2544f60b4e7f8abc441e3ddaaa7f">#</a>
</h3>
<p>
You may have noticed that in the above diagram, it doesn't look as though I separated the two test packages into more packages, and that is, indeed, the case. I've recently described <a href="/2023/07/31/test-driving-the-pyramids-top">how I think about distinguishing kinds of tests</a>, and I don't really care too much whether an automated test exercises only a single function, or a whole bundle of objects. What I do care about is whether a test is simple or complex, fast or slow. That kind of thing.
</p>
<p>
The package labelled "Integration tests" on the diagram is really a small test library that exercises some SQL Server-specific behaviour. Some of the tests in it verify that certain race conditions don't occur. They do that by keep trying to make the race condition occur, until they time out. Since the timeout is 30 seconds per test, this test suite is <em>slow</em>. That's the reason it's a separate library, even though it contains only eight tests. The book contains more details.
</p>
<p>
The "Unit tests" package contains the bulk of the tests: 176 tests, <a href="/2021/02/15/when-properties-are-easier-than-examples">some of which</a> are <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> properties that each run a hundred test cases.
</p>
<p>
Some tests are <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">self-hosted integration tests</a> that rely on the Web Host, and some of them are more 'traditional' unit tests. Dependencies are transitive, so I drew an arrow from the "Unit tests" package to the Web Host. Some unit tests exercise objects in the HTTP Model, and some exercise the Domain Model.
</p>
<p>
You may have another question: If the Integration tests reference the SQL package, then why not the SMTP package? Why is it okay that the unit tests reference the SMTP package?
</p>
<p>
Again, I want to reiterate that the reason I distinguished between these two test packages were related to execution speed rather than architecture. The few SMTP tests are fast enough, so there's no reason to keep them in a separate package.
</p>
<p>
In fact, the SMTP tests don't exercise that the <code>SmtpPostOffice</code> can send email. Rather, I treat that class as a <a href="http://xunitpatterns.com/Humble%20Object.html">Humble Object</a>. The few tests that I do have only verify that the system can parse configuration settings:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"m.example.net"</span>, 587, <span style="color:#a31515;">"grault"</span>, <span style="color:#a31515;">"garply"</span>, <span style="color:#a31515;">"g@example.org"</span>)]
[InlineData(<span style="color:#a31515;">"n.example.net"</span>, 465, <span style="color:#a31515;">"corge"</span>, <span style="color:#a31515;">"waldo"</span>, <span style="color:#a31515;">"c@example.org"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ToPostOfficeReturnsSmtpOffice</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">host</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">port</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">userName</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">password</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">from</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = Create.SmtpOptions(host, port, userName, password, from);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> InMemoryRestaurantDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.ToPostOffice(db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> SmtpPostOffice(host, port, userName, password, from, db);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
Notice, by the way, the <a href="/2021/05/03/structural-equality-for-better-tests">use of structural equality on a service</a>. Consider doing that more often.
</p>
<p>
In any case, the separation of automated tests into two packages may not be the final iteration. It's worked well so far, in this context, but it's possible that had things been different, I would have chosen to have more test packages.
</p>
<h3 id="d61ec0b4390b470ba9b3340be8f902f4">
Conclusion <a href="#d61ec0b4390b470ba9b3340be8f902f4">#</a>
</h3>
<p>
In the book, I made a bold claim: Although I had developed the example code as a monolith, I asserted that I'd been so careful that I could easily tease it apart into multiple packages should I chose to do so.
</p>
<p>
This sounded so much like <a href="https://en.wikipedia.org/wiki/Hubris">hubris</a> that I was trepidatious writing it. I wrote it anyway, because, while I hadn't tried, I was convinced that I was right.
</p>
<p>
Still, it nagged me ever since. What if, after all, I was wrong? I've been wrong before.
</p>
<p>
So I decided to finally make the experiment, and to my relief it turned out that I'd been right.
</p>
<p>
Don't try this at home, kids.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
A first crack at the Args kata
https://blog.ploeh.dk/2023/08/28/a-first-crack-at-the-args-kata
2023-08-28T07:28:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Test-driven development in C#.</em>
</p>
<p>
A customer hired me to swing by to demonstrate test-driven development and <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">tactical Git</a>. To make things interesting, we agreed that they'd give me a <a href="https://en.wikipedia.org/wiki/Kata_(programming)">kata</a> at the beginning of the session. I didn't know which problem they'd give me, so I thought it'd be a good idea to come prepared. I decided to seek out katas that I hadn't done before.
</p>
<p>
The demonstration session was supposed to be two hours in front of a participating audience. In order to make my preparation aligned to that situation, I decided to impose a two-hour time limit to see how far I could get. At the same time, I'd also keep an eye on didactics, so preferably proceeding in an order that would be explainable to an audience.
</p>
<p>
Some katas are more complicated than others, so I'm under no illusion that I can complete any, to me unknown, kata in under two hours. My success criterion for the time limit is that I'd like to reach a point that would satisfy an audience. Even if, after two hours, I don't reach a complete solution, I should leave a creative and intelligent audience with a good idea of how to proceed.
</p>
<p>
The first kata I decided to try was the <a href="https://codingdojo.org/kata/Args/">Args kata</a>. In this article, I'll describe some of the most interesting things that happened along the way. If you want all the details, the code is <a href="https://github.com/ploeh/ArgsCSharp">available on GitHub</a>.
</p>
<h3 id="e51a29225774493ca6b20b6dde4c0f3e">
Boolean parser <a href="#e51a29225774493ca6b20b6dde4c0f3e">#</a>
</h3>
<p>
In short, the goal of the Args kata is to develop an API for parsing command-line arguments.
</p>
<p>
When you encounter a new problem, it's common to have a few false starts until you develop a promising plan. This happened to me as well, but after a few attempts that I quickly stashed, I realised that this is really a validation problem - as in <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">parse, don't validate</a>.
</p>
<p>
The first thing I did after that realisation was to copy verbatim the <code>Validated</code> code from <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a>. I consider it fair game to reuse general-purpose code like this for a kata.
</p>
<p>
With that basic building block available, I decided to start with a parser that would handle Boolean flags. My reasoning was that this might be the simplest parser, since it doesn't have many failure modes. If the flag is present, the value should be interpreted to be <code>true</code>; otherwise, <code>false</code>.
</p>
<p>
Over a series of iterations, I developed this parametrised <a href="https://xunit.net/">xUnit.net</a> test:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">" -l "</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l -p 8080 -d /usr/logs"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-p 8080 -l -d /usr/logs"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-p 8080 -d /usr/logs"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l true"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l false"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"nonsense"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'f'</span>, <span style="color:#a31515;">"-f"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'f'</span>, <span style="color:#a31515;">"foo"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'f'</span>, <span style="color:#a31515;">""</span>, <span style="color:blue;">false</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">TryParseSuccess</span>(<span style="color:blue;">char</span> <span style="color:#1f377f;">flagName</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>, <span style="color:blue;">bool</span> <span style="color:#1f377f;">expected</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BoolParser(flagName);
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(candidate);
Assert.Equal(Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(expected), actual);
}</pre>
</p>
<p>
To be clear, this test started as a <code>[Fact]</code> (single, non-parametrised test) that I subsequently converted to a parametrised test, and then added more and more test cases to.
</p>
<p>
The final implementation of <code>BoolParser</code> looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BoolParser</span> : IParser<<span style="color:blue;">bool</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">char</span> flagName;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BoolParser</span>(<span style="color:blue;">char</span> <span style="color:#1f377f;">flagName</span>)
{
<span style="color:blue;">this</span>.flagName = flagName;
}
<span style="color:blue;">public</span> Validated<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>> <span style="color:#74531f;">TryParse</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">idx</span> = candidate.IndexOf(<span style="color:#a31515;">$"-</span>{flagName}<span style="color:#a31515;">"</span>);
<span style="color:#8f08c4;">if</span> (idx < 0)
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> <span style="color:#1f377f;">nextFlagIdx</span> = candidate[(idx + 2)..].IndexOf(<span style="color:#a31515;">'-'</span>);
<span style="color:blue;">var</span> <span style="color:#1f377f;">bFlag</span> = nextFlagIdx < 0
? candidate[(idx + 2)..]
: candidate.Substring(idx + 2, nextFlagIdx);
<span style="color:#8f08c4;">if</span> (<span style="color:blue;">bool</span>.TryParse(bFlag, <span style="color:blue;">out</span> var <span style="color:#1f377f;">b</span>))
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(b);
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(<span style="color:blue;">true</span>);
}
}</pre>
</p>
<p>
This may not be the most elegant solution, but it passes all tests. Since I was under time pressure, I didn't want to spend too much time polishing the implementation details. As longs as I'm comfortable with the API design and the test cases, I can always refactor later. (I usually say that <em>later is never</em>, which also turned out to be true this time. On the other hand, it's not that the implementation code is <em>awful</em> in any way. It has a cyclomatic complexity of <em>4</em> and fits within a <a href="/2019/11/04/the-80-24-rule">80 x 20 box</a>. It could be much worse.)
</p>
<p>
The <code>IParser</code> interface came afterwards. It wasn't driven by the above test, but by later developments.
</p>
<h3 id="2d94e2103d6f405f89e4bd0b4e1f2ff3">
Rough proof of concept <a href="#2d94e2103d6f405f89e4bd0b4e1f2ff3">#</a>
</h3>
<p>
Once I had a passable implementation of <code>BoolParser</code>, I developed a similar <code>IntParser</code> to a degree where it supported a happy path. With two parsers, I had enough building blocks to demonstrate how to combine them. At that point, I also had some 40 minutes left, so it was time to produce something that might look useful.
</p>
<p>
At first, I wanted to demonstrate that it's possible to combine the two parsers, so I wrote this test:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ParseBoolAndIntProofOfConceptRaw</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">args</span> = <span style="color:#a31515;">"-l -p 8080"</span>;
<span style="color:blue;">var</span> <span style="color:#1f377f;">l</span> = <span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>).TryParse(args).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">p</span> = <span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>).TryParse(args).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
Func<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)> <span style="color:#1f377f;">createTuple</span> = (<span style="color:#1f377f;">b</span>, <span style="color:#1f377f;">i</span>) => (b, i);
<span style="color:blue;">static</span> <span style="color:blue;">string</span>[] <span style="color:#74531f;">combineErrors</span>(<span style="color:blue;">string</span>[] <span style="color:#1f377f;">s1</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">s2</span>) => s1.Concat(s2).ToArray();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = createTuple.Apply(l, combineErrors).Apply(p, combineErrors);
Assert.Equal(Validated.Succeed<<span style="color:blue;">string</span>[], (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)>((<span style="color:blue;">true</span>, 8080)), actual);
}</pre>
</p>
<p>
That's really not pretty, and I wouldn't expect an unsuspecting audience to understand what's going on. It doesn't help that C# is inadequate for <a href="/2018/10/01/applicative-functors">applicative functors</a>. While it's possible to implement <a href="/2018/11/05/applicative-validation">applicative validation</a>, the C# API is awkward. (There are ways to make it better than what's on display here, but keep in mind that I came into this exercise unprepared, and had to grab what was closest at hand.)
</p>
<p>
The main point of the above test was only to demonstrate that it's possible to combine two parsers into one. That took me roughly 15 minutes.
</p>
<p>
Armed with that knowledge, I then proceeded to define this base class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ArgsParser</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T1> parser1;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T2> parser2;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ArgsParser</span>(IParser<T1> <span style="color:#1f377f;">parser1</span>, IParser<T2> <span style="color:#1f377f;">parser2</span>)
{
<span style="color:blue;">this</span>.parser1 = parser1;
<span style="color:blue;">this</span>.parser2 = parser2;
}
<span style="color:blue;">public</span> Validated<<span style="color:blue;">string</span>[], T> <span style="color:#74531f;">TryParse</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">l</span> = parser1.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">p</span> = parser2.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
Func<T1, T2, T> <span style="color:#1f377f;">create</span> = Create;
<span style="color:#8f08c4;">return</span> create.Apply(l, CombineErrors).Apply(p, CombineErrors);
}
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> T <span style="color:#74531f;">Create</span>(T1 <span style="color:#1f377f;">x1</span>, T2 <span style="color:#1f377f;">x2</span>);
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span>[] <span style="color:#74531f;">CombineErrors</span>(<span style="color:blue;">string</span>[] <span style="color:#1f377f;">s1</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">s2</span>)
{
<span style="color:#8f08c4;">return</span> s1.Concat(s2).ToArray();
}
}</pre>
</p>
<p>
While I'm not a fan of inheritance, this seemed the fasted way to expand on the proof of concept. The class encapsulates the ugly details of the <code>ParseBoolAndIntProofOfConceptRaw</code> test, while leaving just enough room for a derived class:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ProofOfConceptParser</span> : ArgsParser<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">ProofOfConceptParser</span>() : <span style="color:blue;">base</span>(<span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>), <span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>))
{
}
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>) <span style="color:#74531f;">Create</span>(<span style="color:blue;">bool</span> <span style="color:#1f377f;">x1</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">x2</span>)
{
<span style="color:#8f08c4;">return</span> (x1, x2);
}
}</pre>
</p>
<p>
This class only defines which parsers to use and how to translate successful results to a single object. Here, because this is still a proof of concept, the resulting object is just a tuple.
</p>
<p>
The corresponding test looks like this:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ParseBoolAndIntProofOfConcept</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ProofOfConceptParser();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(<span style="color:#a31515;">"-l -p 8080"</span>);
Assert.Equal(Validated.Succeed<<span style="color:blue;">string</span>[], (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)>((<span style="color:blue;">true</span>, 8080)), actual);
}</pre>
</p>
<p>
At this point, I hit the two-hour mark, but I think I managed to produce enough code to convince a hypothetical audience that a complete solution is within grasp.
</p>
<p>
What remained was to
</p>
<ul>
<li>add proper error handling to <code>IntParser</code></li>
<li>add a corresponding <code>StringParser</code></li>
<li>improve the <code>ArgsParser</code> API</li>
<li>add better demo examples of the improved <code>ArgsParser</code> API</li>
</ul>
<p>
While I could leave this as an exercise to the reader, I couldn't just leave the code like that.
</p>
<h3 id="0a1a987c02f8421785382a6973eccd47">
Finishing the kata <a href="#0a1a987c02f8421785382a6973eccd47">#</a>
</h3>
<p>
For my own satisfaction, I decided to complete the kata, which I did in another hour.
</p>
<p>
Although I had started with an abstract base class, I know <a href="/2018/02/19/abstract-class-isomorphism">how to refactor it to a <code>sealed</code> class with an injected Strategy</a>. I did that for the existing class, and also added one that supports three parsers instead of two:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ArgsParser</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">T3</span>, <span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T1> parser1;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T2> parser2;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T3> parser3;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<T1, T2, T3, T> create;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ArgsParser</span>(
IParser<T1> <span style="color:#1f377f;">parser1</span>,
IParser<T2> <span style="color:#1f377f;">parser2</span>,
IParser<T3> <span style="color:#1f377f;">parser3</span>,
Func<T1, T2, T3, T> <span style="color:#1f377f;">create</span>)
{
<span style="color:blue;">this</span>.parser1 = parser1;
<span style="color:blue;">this</span>.parser2 = parser2;
<span style="color:blue;">this</span>.parser3 = parser3;
<span style="color:blue;">this</span>.create = create;
}
<span style="color:blue;">public</span> Validated<<span style="color:blue;">string</span>[], T> <span style="color:#74531f;">TryParse</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">x1</span> = parser1.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">x2</span> = parser2.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">x3</span> = parser3.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:#8f08c4;">return</span> create
.Apply(x1, CombineErrors)
.Apply(x2, CombineErrors)
.Apply(x3, CombineErrors);
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span>[] <span style="color:#74531f;">CombineErrors</span>(<span style="color:blue;">string</span>[] <span style="color:#1f377f;">s1</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">s2</span>)
{
<span style="color:#8f08c4;">return</span> s1.Concat(s2).ToArray();
}
}</pre>
</p>
<p>
Granted, that's a bit of boilerplate, but if you imagine this as supplied by a reusable library, you only have to write this once.
</p>
<p>
I was now ready to parse the kata's central example, <code>"-l -p 8080 -d /usr/logs"</code>, to a strongly typed value:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">TestConfig</span>(<span style="color:blue;">bool</span> <span style="color:#1f377f;">DoLog</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">Port</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">Directory</span>);
[Theory]
[InlineData(<span style="color:#a31515;">"-l -p 8080 -d /usr/logs"</span>)]
[InlineData(<span style="color:#a31515;">"-p 8080 -l -d /usr/logs"</span>)]
[InlineData(<span style="color:#a31515;">"-d /usr/logs -l -p 8080"</span>)]
[InlineData(<span style="color:#a31515;">" -d /usr/logs -l -p 8080 "</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ParseConfig</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">args</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ArgsParser<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, <span style="color:blue;">string</span>, TestConfig>(
<span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>),
<span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>),
<span style="color:blue;">new</span> StringParser(<span style="color:#a31515;">'d'</span>),
(<span style="color:#1f377f;">b</span>, <span style="color:#1f377f;">i</span>, <span style="color:#1f377f;">s</span>) => <span style="color:blue;">new</span> TestConfig(b, i, s));
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(args);
Assert.Equal(
Validated.Succeed<<span style="color:blue;">string</span>[], TestConfig>(
<span style="color:blue;">new</span> TestConfig(<span style="color:blue;">true</span>, 8080, <span style="color:#a31515;">"/usr/logs"</span>)),
actual);
}</pre>
</p>
<p>
This test parses some variations of the example input into an immutable <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a>.
</p>
<p>
What happens if the input is malformed? Here's an example of that:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">FailToParseConfig</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ArgsParser<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, <span style="color:blue;">string</span>, TestConfig>(
<span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>),
<span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>),
<span style="color:blue;">new</span> StringParser(<span style="color:#a31515;">'d'</span>),
(<span style="color:#1f377f;">b</span>, <span style="color:#1f377f;">i</span>, <span style="color:#1f377f;">s</span>) => <span style="color:blue;">new</span> TestConfig(b, i, s));
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(<span style="color:#a31515;">"-p aityaity"</span>);
Assert.True(actual.Match(
onFailure: <span style="color:#1f377f;">ss</span> => ss.Contains(<span style="color:#a31515;">"Expected integer for flag '-p', but got \"aityaity\"."</span>),
onSuccess: <span style="color:#1f377f;">_</span> => <span style="color:blue;">false</span>));
Assert.True(actual.Match(
onFailure: <span style="color:#1f377f;">ss</span> => ss.Contains(<span style="color:#a31515;">"Missing value for flag '-d'."</span>),
onSuccess: <span style="color:#1f377f;">_</span> => <span style="color:blue;">false</span>));
}</pre>
</p>
<p>
Of particular interest is that, as promised by applicative validation, parsing failures don't short-circuit. The input value <code>"-p aityaity"</code> has two problems, and both are reported by <code>TryParse</code>.
</p>
<p>
At this point I was happy that I had sufficiently demonstrated the viability of the design. I decided to call it a day.
</p>
<h3 id="0d715ed2918d48c3840fe2f3ac7a6966">
Conclusion <a href="#0d715ed2918d48c3840fe2f3ac7a6966">#</a>
</h3>
<p>
As I did the Args kata, I found it interesting enough to warrant an article. Once I realised that I could use applicative parsing as the basis for the API, the rest followed.
</p>
<p>
There's room for improvement, but while <a href="/2020/01/13/on-doing-katas">doing katas</a> is valuable, there are marginal returns in perfecting the code. Get the main functionality working, learn from it, and move on to another exercise.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Compile-time type-checked truth tables
https://blog.ploeh.dk/2023/08/21/compile-time-type-checked-truth-tables
2023-08-21T08:07:00+00:00
Mark Seemann
<div id="post">
<p>
<em>With simple and easy-to-understand examples in F# and Haskell.</em>
</p>
<p>
<a href="https://blog.testdouble.com/authors/eve-ragins/">Eve Ragins</a> recently published an article called <a href="https://blog.testdouble.com/posts/2023-08-14-using-truth-tables/">Why you should use truth tables in your job</a>. It's a good article. You should read it.
</p>
<p>
In it, she outlines how creating a <a href="https://en.wikipedia.org/wiki/Truth_table">Truth Table</a> can help you smoke out edge cases or unclear requirements.
</p>
<p>
I agree, and it also beautifully explains why I find <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data types</a> so useful.
</p>
<p>
With languages like <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a>, this kind of modelling is part of the language, and you even get statically-typed compile-time checking that tells you whether you've handled all combinations.
</p>
<p>
Eve Ragins points out that there are other, socio-technical benefits from drawing up a truth table that you can, perhaps, print out, or otherwise share with non-technical stakeholders. Thus, the following is in no way meant as a full replacement, but rather as examples of how certain languages have affordances that enable you to think like this while programming.
</p>
<h3 id="9e4397fd77a043db8043dac6aff81c4a">
F# <a href="#9e4397fd77a043db8043dac6aff81c4a">#</a>
</h3>
<p>
I'm not going to go through Eve Ragins' blow-by-blow walkthrough, explaining how you construct a truth table. Rather, I'm just briefly going to show how simple it is to do the same in F#.
</p>
<p>
Most of the inputs in her example are Boolean values, which already exist in the language, but we need a type for the item status:
</p>
<p>
<pre><span style="color:blue;">type</span> ItemStatus = NotAvailable | Available | InUse</pre>
</p>
<p>
As is typical in F#, a type declaration is just a one-liner.
</p>
<p>
Now for something a little more interesting. In Eve Ragins' final table, there's a footnote that says that the dash/minus symbol indicates that the value is irrelevant. If you look a little closer, it turns out that the <code>should_field_be_editable</code> value is irrelevant whenever the <code>should_field_show</code> value is <code>FALSE</code>.
</p>
<p>
So instead of a <code>bool * bool</code> tuple, you really have a three-state type like this:
</p>
<p>
<pre><span style="color:blue;">type</span> FieldState = Hidden | ReadOnly | ReadWrite</pre>
</p>
<p>
It would probably have taken a few iterations to learn this if you'd jumped straight into pattern matching in F#, but since F# requires you to define types and functions before you can use them, I list the type now.
</p>
<p>
That's all you need to produce a truth table in F#:
</p>
<p>
<pre><span style="color:blue;">let</span> decide requiresApproval canUserApprove itemStatus =
<span style="color:blue;">match</span> requiresApproval, canUserApprove, itemStatus <span style="color:blue;">with</span>
| <span style="color:blue;">true</span>, <span style="color:blue;">true</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">false</span>, <span style="color:blue;">true</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">false</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">false</span>, <span style="color:blue;">false</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">true</span>, Available <span style="color:blue;">-></span> ReadWrite
| <span style="color:blue;">false</span>, <span style="color:blue;">true</span>, Available <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">false</span>, Available <span style="color:blue;">-></span> ReadOnly
| <span style="color:blue;">false</span>, <span style="color:blue;">false</span>, Available <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">true</span>, InUse <span style="color:blue;">-></span> ReadOnly
| <span style="color:blue;">false</span>, <span style="color:blue;">true</span>, InUse <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">false</span>, InUse <span style="color:blue;">-></span> ReadOnly
| <span style="color:blue;">false</span>, <span style="color:blue;">false</span>, InUse <span style="color:blue;">-></span> Hidden
</pre>
</p>
<p>
I've called the function <code>decide</code> because it wasn't clear to me what else to call it.
</p>
<p>
What's so nice about F# pattern matching is that the compiler can tell if you've missed a combination. If you forget a combination, you get a helpful <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/compiler-messages/fs0025">Incomplete pattern match</a> compiler warning that points out the combination that you missed.
</p>
<p>
And as I argue in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, you should turn warnings into errors. This would also be helpful in a case like this, since you'd be prevented from forgetting an edge case.
</p>
<h3 id="e8bce005200f4f3394eed71cffe955d0">
Haskell <a href="#e8bce005200f4f3394eed71cffe955d0">#</a>
</h3>
<p>
You can do the same exercise in Haskell, and the result is strikingly similar:
</p>
<p>
<pre><span style="color:blue;">data</span> ItemStatus = NotAvailable | Available | InUse <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:blue;">data</span> FieldState = Hidden | ReadOnly | ReadWrite <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:#2b91af;">decide</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Bool</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span> <span style="color:blue;">-></span> <span style="color:blue;">ItemStatus</span> <span style="color:blue;">-></span> <span style="color:blue;">FieldState</span>
decide True True NotAvailable = Hidden
decide False True NotAvailable = Hidden
decide True False NotAvailable = Hidden
decide False False NotAvailable = Hidden
decide True True Available = ReadWrite
decide False True Available = Hidden
decide True False Available = ReadOnly
decide False False Available = Hidden
decide True True InUse = ReadOnly
decide False True InUse = Hidden
decide True False InUse = ReadOnly
decide False False InUse = Hidden</pre>
</p>
<p>
Just like in F#, if you forget a combination, the compiler will tell you:
</p>
<p>
<pre>LibrarySystem.hs:8:1: <span style="color:red;">warning:</span> [<span style="color:red;">-Wincomplete-patterns</span>]
Pattern match(es) are non-exhaustive
In an equation for `decide':
Patterns of type `Bool', `Bool', `ItemStatus' not matched:
False False NotAvailable
<span style="color:blue;">|</span>
<span style="color:blue;">8 |</span> <span style="color:red;">decide True True NotAvailable = Hidden</span>
<span style="color:blue;">|</span> <span style="color:red;">^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...</span></pre>
</p>
<p>
To be clear, that combination is <em>not</em> missing from the above code example. This compiler warning was one I subsequently caused by commenting out a line.
</p>
<p>
It's also possible to turn warnings into errors in Haskell.
</p>
<h3 id="52ba967712374cd9aa91b21bc61aa8a1">
Conclusion <a href="#52ba967712374cd9aa91b21bc61aa8a1">#</a>
</h3>
<p>
I love languages with algebraic data types because they don't just enable modelling like this, they <em>encourage</em> it. This makes it much easier to write code that handles various special cases that I'd easily overlook in other languages. In languages like F# and Haskell, the compiler will tell you if you forgot to deal with a combination.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Replacing Mock and Stub with a Fake
https://blog.ploeh.dk/2023/08/14/replacing-mock-and-stub-with-a-fake
2023-08-14T07:23:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A simple C# example.</em>
</p>
<p>
A reader recently wrote me about my 2013 article <a href="/2013/10/23/mocks-for-commands-stubs-for-queries">Mocks for Commands, Stubs for Queries</a>, commenting that the 'final' code looks suspect. Since it looks like the following, that's hardly an overstatement.
</p>
<p>
<pre><span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">GetUser</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">u</span> = <span style="color:blue;">this</span>.userRepository.Read(userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (u.Id == 0)
<span style="color:blue;">this</span>.userRepository.Create(1234);
<span style="font-weight:bold;color:#8f08c4;">return</span> u;
}</pre>
</p>
<p>
Can you spot what's wrong?
</p>
<h3 id="a1c674ca5ff049ddac08a7618f95e57b">
Missing test cases <a href="#a1c674ca5ff049ddac08a7618f95e57b">#</a>
</h3>
<p>
You might point out that this example seems to violate <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Command Query Separation</a>, and probably other design principles as well. I agree that the example is a bit odd, but that's not what I have in mind.
</p>
<p>
The problem with the above example is that while it correctly calls the <code>Read</code> method with the <code>userId</code> parameter, it calls <code>Create</code> with the hardcoded constant <code>1234</code>. It really ought to call <code>Create</code> with <code>userId</code>.
</p>
<p>
Does this mean that the technique that I described in 2013 is wrong? I don't think so. Rather, I left the code in a rather unhelpful state. What I had in mind with that article was the technique I called <em>data flow verification</em>. As soon as I had delivered that message, I was, according to my own goals, done. I wrapped up the article, leaving the code as shown above.
</p>
<p>
As the reader remarked, it's noteworthy that an article about better unit testing leaves the System Under Test (SUT) in an obviously defect state.
</p>
<p>
The short response is that at least one test case is missing. Since this was only demo code to show an example, the entire test suite is this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SomeControllerTests</span>
{
[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GetUserReturnsCorrectValue</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> User();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">td</span> = <span style="color:blue;">new</span> Mock<IUserRepository>();
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Read(userId)).Returns(expected);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(td.Object);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetUser(userId);
Assert.Equal(expected, actual);
}
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UserIsSavedIfItDoesNotExist</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">td</span> = <span style="color:blue;">new</span> Mock<IUserRepository>();
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Read(1234)).Returns(<span style="color:blue;">new</span> User { Id = 0 });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(td.Object);
sut.GetUser(1234);
td.Verify(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Create(1234));
}
}</pre>
</p>
<p>
There are three test cases: Two for the parametrised <code>GetUserReturnsCorrectValue</code> method and one test case for the <code>UserIsSavedIfItDoesNotExist</code> test. Since the latter only verifies the hardcoded value <code>1234</code> the <a href="/2019/10/07/devils-advocate">Devil's advocate</a> can get by with using that hardcoded value as well.
</p>
<h3 id="f0bc9ac185d345c29e61efec2ee7e6b9">
Adding a test case <a href="#f0bc9ac185d345c29e61efec2ee7e6b9">#</a>
</h3>
<p>
The solution to that problem is simple enough. Add another test case by converting <code>UserIsSavedIfItDoesNotExist</code> to a parametrised test:
</p>
<p>
<pre>[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UserIsSavedIfItDoesNotExist</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">td</span> = <span style="color:blue;">new</span> Mock<IUserRepository>();
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Read(userId)).Returns(<span style="color:blue;">new</span> User { Id = 0 });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(td.Object);
sut.GetUser(userId);
td.Verify(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Create(userId));
}</pre>
</p>
<p>
There's no reason to edit the other test method; this should be enough to elicit a change to the SUT:
</p>
<p>
<pre><span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">GetUser</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">u</span> = <span style="color:blue;">this</span>.userRepository.Read(userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (u.Id == 0)
<span style="color:blue;">this</span>.userRepository.Create(userId);
<span style="font-weight:bold;color:#8f08c4;">return</span> u;
}</pre>
</p>
<p>
When you use <a href="http://xunitpatterns.com/Mock%20Object.html">Mocks</a> (or, rather, <a href="http://xunitpatterns.com/Test%20Spy.html">Spies</a>) and <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> the Data Flow Verification technique is useful.
</p>
<p>
On the other hand, I no longer use Spies or Stubs since <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">they tend to break encapsulation</a>.
</p>
<h3 id="0864ba24b8a841828c3aaba0bc84877c">
Fake <a href="#0864ba24b8a841828c3aaba0bc84877c">#</a>
</h3>
<p>
These days, I tend to <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">only model real application dependencies as Test Doubles</a>, and when I do, I use <a href="http://xunitpatterns.com/Fake%20Object.html">Fakes</a>.
</p>
<p>
<img src="/content/binary/dos-equis-fakes.jpg" alt="Dos Equis meme with the text: I don't always use Test Doubles, but when I do, I use Fakes.">
</p>
<p>
While the article series <a href="/2019/02/18/from-interaction-based-to-state-based-testing">From interaction-based to state-based testing</a> goes into more details, I think that this small example is a good opportunity to demonstrate the technique.
</p>
<p>
The <code>IUserRepository</code> interface is defined like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IUserRepository</span>
{
User <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>);
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>);
}</pre>
</p>
<p>
A typical Fake is an in-memory collection:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeUserRepository</span> : Collection<User>, IUserRepository
{
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
Add(<span style="color:blue;">new</span> User { Id = userId });
}
<span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">user</span> = <span style="color:blue;">this</span>.SingleOrDefault(<span style="font-weight:bold;color:#1f377f;">u</span> => u.Id == userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (user == <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> User { Id = 0 };
<span style="font-weight:bold;color:#8f08c4;">return</span> user;
}
}</pre>
</p>
<p>
In my experience, they're typically easy to implement by inheriting from a collection base class. Such an object exhibits typical traits of a Fake object: It fulfils the implied contract, but it lacks some of the 'ilities'.
</p>
<p>
The contract of a Repository is typically that if you add an Entity, you'd expect to be able to retrieve it later. If the Repository offers a <code>Delete</code> method (this one doesn't), you'd expect the deleted Entity to be gone, so that you <em>can't</em> retrieve it. And so on. The <code>FakeUserRepository</code> class fulfils such a contract.
</p>
<p>
On the other hand, you'd also expect a proper Repository implementation to support more than that:
</p>
<ul>
<li>You'd expect a proper implementation to persist data so that you can reboot or change computers without losing data.</li>
<li>You'd expect a proper implementation to correctly handle multiple threads.</li>
<li>You <em>may</em> expect a proper implementation to support <a href="https://en.wikipedia.org/wiki/ACID">ACID</a> transactions.</li>
</ul>
<p>
The <code>FakeUserRepository</code> does none of that, but in the context of a unit test, it doesn't matter. The data exists as long as the object exists, and that's until it goes out of scope. As long as a test needs the Repository, it remains in scope, and the data is there.
</p>
<p>
Likewise, each test runs in a single thread. Even when tests run in parallel, each test has its own Fake object, so there's no shared state. Therefore, even though <code>FakeUserRepository</code> isn't thread-safe, it doesn't have to be.
</p>
<h3 id="a479805231124c1e8d74856a2cce2762">
Testing with the Fake <a href="#a479805231124c1e8d74856a2cce2762">#</a>
</h3>
<p>
You can now rewrite the tests to use <code>FakeUserRepository</code>:
</p>
<p>
<pre>[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GetUserReturnsCorrectValue</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> User { Id = userId };
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeUserRepository { expected };
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetUser(userId);
Assert.Equal(expected, actual);
}
[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UserIsSavedIfItDoesNotExist</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeUserRepository();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(db);
sut.GetUser(userId);
Assert.Single(db, <span style="font-weight:bold;color:#1f377f;">u</span> => u.Id == userId);
}</pre>
</p>
<p>
Instead of asking a Spy whether or not a particular method was called (which is an implementation detail), the <code>UserIsSavedIfItDoesNotExist</code> test verifies the posterior state of the database.
</p>
<h3 id="5ce6d826519d44ec905096a2098513ca">
Conclusion <a href="#5ce6d826519d44ec905096a2098513ca">#</a>
</h3>
<p>
In my experience, using Fakes simplifies unit tests. While you may have to edit the Fake implementation from time to time, you edit that code in a single place. The alternative is to edit <em>all</em> affected tests, every time you change something about a dependency. This is also known as <a href="https://en.wikipedia.org/wiki/Shotgun_surgery">Shotgun Surgery</a> and considered an antipattern.
</p>
<p>
The code base that accompanies my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> has more realistic examples of this technique, and much else.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
NonEmpty catamorphism
https://blog.ploeh.dk/2023/08/07/nonempty-catamorphism
2023-08-07T11:40:00+00:00
Mark Seemann
<div id="post">
<p>
<em>The universal API for generic non-empty collections, with examples in C# and Haskell.</em>
</p>
<p>
This article is part of an <a href="/2019/04/29/catamorphisms">article series about catamorphisms</a>. A catamorphism is a <a href="/2017/10/04/from-design-patterns-to-category-theory">universal abstraction</a> that describes how to digest a data structure into a potentially more compact value.
</p>
<p>
I was recently doing some work that required a data structure like a collection, but with the additional constraint that it should be guaranteed to have at least one element. I've known about <a href="https://www.haskell.org/">Haskell</a>'s <a href="https://hackage.haskell.org/package/base/docs/Data-List-NonEmpty.html">NonEmpty</a> type, and <a href="/2017/12/11/semigroups-accumulate">how to port it to C#</a> for years. This time I needed to implement it in a third language, and since I had a little extra time available, I thought it'd be interesting to pursue a conjecture of mine: It seems as though you can implement most (all?) of a generic data structure's API based on its catamorphism.
</p>
<p>
While I could make a guess as to how a catamorphism might look for a non-empty collection, I wasn't sure. A quick web search revealed nothing conclusive, so I decided to deduce it from first principles. As this article series demonstrates, you can derive the catamorphism from a type's isomorphic <a href="https://bartoszmilewski.com/2017/02/28/f-algebras/">F-algebra</a>.
</p>
<p>
The beginning of this article presents the catamorphism in C#, with an example. The rest of the article describes how to deduce the catamorphism. This part of the article presents my work in Haskell. Readers not comfortable with Haskell can just read the first part, and consider the rest of the article as an optional appendix.
</p>
<h3 id="1e7c622eea7d4a7bad2edc9958d865ce">
C# catamorphism <a href="#1e7c622eea7d4a7bad2edc9958d865ce">#</a>
</h3>
<p>
This article will use a custom C# class called <code><span style="color:#2b91af;">NonEmptyCollection</span><<span style="color:#2b91af;">T</span>></code>, which is near-identical to the <code><span style="color:#2b91af;">NotEmptyCollection</span><<span style="color:#2b91af;">T</span>></code> originally introduced in the article <a href="/2017/12/11/semigroups-accumulate">Semigroups accumulate</a>.
</p>
<p>
I don't know why I originally chose to name the class <code>NotEmptyCollection</code> instead of <code>NonEmptyCollection</code>, but it's annoyed me ever since. I've finally decided to rectify that mistake, so from now on, the name is <code>NonEmptyCollection</code>.
</p>
<p>
The catamorphism for <code>NonEmptyCollection</code> is this instance method:
</p>
<p>
<pre><span style="color:blue;">public</span> TResult Aggregate<<span style="color:#2b91af;">TResult</span>>(Func<T, IReadOnlyCollection<T>, TResult> algebra)
{
<span style="color:blue;">return</span> algebra(Head, Tail);
}</pre>
</p>
<p>
Because the <code>NonEmptyCollection</code> class is really just a glorified tuple, the <code>algebra</code> is any function which produces a single value from the two constituent values.
</p>
<p>
It's easy to fall into the trap of thinking of the catamorphism as 'reducing' the data structure to a more compact form. While this is a common kind of operation, loss of data is not inevitable. You can, for example, return a new collection, essentially doing nothing:
</p>
<p>
<pre><span style="color:blue;">var</span> nec = <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(42, 1337, 2112, 666);
<span style="color:blue;">var</span> same = nec.Aggregate((x, xs) => <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(x, xs.ToArray()));</pre>
</p>
<p>
This <code>Aggregate</code> method enables you to safely find a maximum value:
</p>
<p>
<pre><span style="color:blue;">var</span> nec = <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(42, 1337, 2112, 666);
<span style="color:blue;">var</span> max = nec.Aggregate((x, xs) => xs.Aggregate(x, Math.Max));</pre>
</p>
<p>
or to <a href="/2020/02/03/non-exceptional-averages">safely calculate an average</a>:
</p>
<p>
<pre><span style="color:blue;">var</span> nec = <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(42, 1337, 2112, 666);
<span style="color:blue;">var</span> average = nec.Aggregate((x, xs) => xs.Aggregate(x, (a, b) => a + b) / (xs.Count + 1.0));</pre>
</p>
<p>
Both of these two last examples use the built-in <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.aggregate">Aggregate</a> function to accumulate the <code>xs</code>. It uses the overload that takes a seed, for which it supplies <code>x</code>. This means that there's guaranteed to be at least that one value.
</p>
<p>
The catamorphism given here is not unique. You can create a trivial variation by swapping the two function arguments, so that <code>x</code> comes after <code>xs</code>.
</p>
<h3 id="3b188484128d4cef935a13426d8fb51b">
NonEmpty F-algebra <a href="#3b188484128d4cef935a13426d8fb51b">#</a>
</h3>
<p>
As in the <a href="/2019/05/27/list-catamorphism">previous article</a>, I'll use <code>Fix</code> and <code>cata</code> as explained in <a href="https://bartoszmilewski.com">Bartosz Milewski</a>'s excellent <a href="https://bartoszmilewski.com/2017/02/28/f-algebras/">article on F-algebras</a>.
</p>
<p>
As always, start with the underlying endofunctor:
</p>
<p>
<pre><span style="color:blue;">data</span> NonEmptyF a c = NonEmptyF { <span style="color:blue;">head</span> :: a, <span style="color:blue;">tail</span> :: ListFix a }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Read</span>)
<span style="color:blue;">instance</span> <span style="color:blue;">Functor</span> (<span style="color:blue;">NonEmptyF</span> a) <span style="color:blue;">where</span>
<span style="color:blue;">fmap</span> _ (NonEmptyF x xs) = NonEmptyF x xs</pre>
</p>
<p>
Instead of using Haskell's standard list (<code>[]</code>) for the tail, I've used <code>ListFix</code> from <a href="/2019/05/27/list-catamorphism">the article on list catamorphism</a>. This should, hopefully, demonstrate how you can build on already established definitions derived from first principles.
</p>
<p>
Since a non-empty collection is really just a glorified tuple of <em>head</em> and <em>tail</em>, there's no recursion, and thus, the carrier type <code>c</code> is not used. You could argue that going through all of these motions is overkill, but it still provides some insights. This is similar to the <a href="/2019/05/06/boolean-catamorphism">Boolean catamorphism</a> and <a href="/2019/05/20/maybe-catamorphism">Maybe catamorphism</a>.
</p>
<p>
The <code>fmap</code> function ignores the mapping argument (often called <code>f</code>), since the <code>Functor</code> instance maps <code>NonEmptyF a c</code> to <code>NonEmptyF a c1</code>, but the <code>c</code> or <code>c1</code> type is not used.
</p>
<p>
As was the case when deducing the recent catamorphisms, Haskell isn't too happy about defining instances for a type like <code>Fix (NonEmptyF a)</code>. To address that problem, you can introduce a <code>newtype</code> wrapper:
</p>
<p>
<pre><span style="color:blue;">newtype</span> NonEmptyFix a =
NonEmptyFix { unNonEmptyFix :: Fix (NonEmptyF a) } <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Read</span>)</pre>
</p>
<p>
You can define <code>Functor</code>, <code>Applicative</code>, <code>Monad</code>, etc. instances for this type without resorting to any funky GHC extensions. Keep in mind that, ultimately, the purpose of all this code is just to figure out what the catamorphism looks like. This code isn't intended for actual use.
</p>
<p>
A helper function makes it easier to define <code>NonEmptyFix</code> values:
</p>
<p>
<pre><span style="color:#2b91af;">createNonEmptyF</span> <span style="color:blue;">::</span> a <span style="color:blue;">-></span> <span style="color:blue;">ListFix</span> a <span style="color:blue;">-></span> <span style="color:blue;">NonEmptyFix</span> a
createNonEmptyF x xs = NonEmptyFix $ Fix $ NonEmptyF x xs</pre>
</p>
<p>
Here's how to use it:
</p>
<p>
<pre>ghci> createNonEmptyF 42 $ consF 1337 $ consF 2112 nilF
NonEmptyFix {
unNonEmptyFix = Fix (NonEmptyF 42 (ListFix (Fix (ConsF 1337 (Fix (ConsF 2112 (Fix NilF)))))))}</pre>
</p>
<p>
While this is quite verbose, keep in mind that the code shown here isn't meant to be used in practice. The goal is only to deduce catamorphisms from more basic universal abstractions, and you now have all you need to do that.
</p>
<h3 id="ccc6b13fcf794ec39f98fdd3e0c61460">
Haskell catamorphism <a href="#ccc6b13fcf794ec39f98fdd3e0c61460">#</a>
</h3>
<p>
At this point, you have two out of three elements of an F-Algebra. You have an endofunctor (<code>NonEmptyF a</code>), and an object <code>c</code>, but you still need to find a morphism <code>NonEmptyF a c -> c</code>. Notice that the algebra you have to find is the function that reduces the functor to its <em>carrier type</em> <code>c</code>, not the 'data type' <code>a</code>. This takes some time to get used to, but that's how catamorphisms work. This doesn't mean, however, that you get to ignore <code>a</code>, as you'll see.
</p>
<p>
As in the previous articles, start by writing a function that will become the catamorphism, based on <code>cata</code>:
</p>
<p>
<pre>nonEmptyF = cata alg . unNonEmptyFix
<span style="color:blue;">where</span> alg (NonEmptyF x xs) = <span style="color:blue;">undefined</span></pre>
</p>
<p>
While this compiles, with its <code>undefined</code> implementation of <code>alg</code>, it obviously doesn't do anything useful. I find, however, that it helps me think. How can you return a value of the type <code>c</code> from <code>alg</code>? You could pass a function argument to the <code>nonEmptyF</code> function and use it with <code>x</code> and <code>xs</code>:
</p>
<p>
<pre><span style="color:#2b91af;">nonEmptyF</span> <span style="color:blue;">::</span> (a <span style="color:blue;">-></span> <span style="color:blue;">ListFix</span> a <span style="color:blue;">-></span> c) <span style="color:blue;">-></span> <span style="color:blue;">NonEmptyFix</span> a <span style="color:blue;">-></span> c
nonEmptyF f = cata alg . unNonEmptyFix
<span style="color:blue;">where</span> alg (NonEmptyF x xs) = f x xs</pre>
</p>
<p>
This works. Since <code>cata</code> has the type <code>Functor f => (f a -> a) -> Fix f -> a</code>, that means that <code>alg</code> has the type <code>f a -> a</code>. In the case of <code>NonEmptyF</code>, the compiler infers that the <code>alg</code> function has the type <code>NonEmptyF a c -> c1</code>, which fits the bill, since <code>c</code> may be the same type as <code>c1</code>.
</p>
<p>
This, then, is the catamorphism for a non-empty collection. This one is just a single function. It's still not the only possible catamorphism, since you could trivially flip the arguments to <code>f</code>.
</p>
<p>
I've chosen this representation because the arguments <code>x</code> and <code>xs</code> are defined in the same order as the order of <code>head</code> before <code>tail</code>. Notice how this is the same order as the above C# <code>Aggregate</code> method.
</p>
<h3 id="2ecc4634c63e40e4a9d47be4bffa4d5f">
Basis <a href="#2ecc4634c63e40e4a9d47be4bffa4d5f">#</a>
</h3>
<p>
You can implement most other useful functionality with <code>nonEmptyF</code>. Here's the <code>Semigroup</code> instance and a useful helper function:
</p>
<p>
<pre><span style="color:#2b91af;">toListFix</span> <span style="color:blue;">::</span> <span style="color:blue;">NonEmptyFix</span> a <span style="color:blue;">-></span> <span style="color:blue;">ListFix</span> a
toListFix = nonEmptyF consF
<span style="color:blue;">instance</span> <span style="color:blue;">Semigroup</span> (<span style="color:blue;">NonEmptyFix</span> a) <span style="color:blue;">where</span>
xs <> ys =
nonEmptyF (\x xs' -> createNonEmptyF x $ xs' <> toListFix ys) xs</pre>
</p>
<p>
The implementation uses <code>nonEmptyF</code> to operate on <code>xs</code>. Inside the lambda expression, it converts <code>ys</code> to a list, and uses <a href="/2019/05/27/list-catamorphism">the <code>ListFix</code> <code>Semigroup</code> instance</a> to concatenate <code>xs</code> with it.
</p>
<p>
Here's the <code>Functor</code> instance:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Functor</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
<span style="color:blue;">fmap</span> f = nonEmptyF (\x xs -> createNonEmptyF (f x) $ <span style="color:blue;">fmap</span> f xs)</pre>
</p>
<p>
Like the <code>Semigroup</code> instance, this <code>fmap</code> implementation uses <code>fmap</code> on <code>xs</code>, which is the <code>ListFix</code> <code>Functor</code> instance.
</p>
<p>
The <code>Applicative</code> instance is much harder to write from scratch (or, at least, I couldn't come up with a simpler way):
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Applicative</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
pure x = createNonEmptyF x nilF
liftA2 f xs ys =
nonEmptyF
(\x xs' ->
nonEmptyF
(\y ys' ->
createNonEmptyF
(f x y)
(liftA2 f (consF x nilF) ys' <> liftA2 f xs' (consF y ys')))
ys)
xs</pre>
</p>
<p>
While that looks complicated, it's not <em>that</em> bad. It uses <code>nonEmptyF</code> to 'loop' over the <code>xs</code>, and then a nested call to <code>nonEmptyF</code> to 'loop' over the <code>ys</code>. The inner lambda expression uses <code>f x y</code> to calculate the head, but it also needs to calculate all other combinations of values in <code>xs</code> and <code>ys</code>.
</p>
<p>
<img src="/content/binary/non-empty-applicative-x-y.png" alt="Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x and y box are connected by an arrow labelled f.">
</p>
<p>
First, it keeps <code>x</code> fixed and 'loops' through all the remaining <code>ys'</code>; that's the <code>liftA2 f (consF x nilF) ys'</code> part:
</p>
<p>
<img src="/content/binary/non-empty-applicative-x-ys.png" alt="Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x and y1, y2, y3 boxes are connected by three arrows labelled with a single f.">
</p>
<p>
Then it 'loops' over all the remaining <code>xs'</code> and all the <code>ys</code>; that is, <code>liftA2 f xs' (consF y ys')</code>.
</p>
<p>
<img src="/content/binary/non-empty-applicative-xs-ys.png" alt="Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x1, x2, x3 boxes are connected to the y, y1, y2, y3 boxes by arrows labelled with a single f.">
</p>
<p>
The two <code>liftA2</code> functions apply to the <code>ListFix</code> <code>Applicative</code> instance.
</p>
<p>
You'll be happy to see, I think, that the <code>Monad</code> instance is simpler:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Monad</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
xs >>= f =
nonEmptyF (\x xs' ->
nonEmptyF
(\y ys -> createNonEmptyF y $ ys <> (xs' >>= toListFix . f)) (f x)) xs</pre>
</p>
<p>
And fortunately, <code>Foldable</code> and <code>Traversable</code> are even simpler:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Foldable</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
<span style="color:blue;">foldr</span> f seed = nonEmptyF (\x xs -> f x $ <span style="color:blue;">foldr</span> f seed xs)
<span style="color:blue;">instance</span> <span style="color:blue;">Traversable</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
traverse f = nonEmptyF (\x xs -> liftA2 createNonEmptyF (f x) (traverse f xs))</pre>
</p>
<p>
Finally, you can implement conversions to and from the <code>NonEmpty</code> type from <code>Data.List.NonEmpty</code>:
</p>
<p>
<pre><span style="color:#2b91af;">toNonEmpty</span> <span style="color:blue;">::</span> <span style="color:blue;">NonEmptyFix</span> a <span style="color:blue;">-></span> <span style="color:blue;">NonEmpty</span> a
toNonEmpty = nonEmptyF (\x xs -> x :| toList xs)
<span style="color:#2b91af;">fromNonEmpty</span> <span style="color:blue;">::</span> <span style="color:blue;">NonEmpty</span> a <span style="color:blue;">-></span> <span style="color:blue;">NonEmptyFix</span> a
fromNonEmpty (x :| xs) = createNonEmptyF x $ fromList xs</pre>
</p>
<p>
This demonstrates that <code>NonEmptyFix</code> is isomorphic to <code>NonEmpty</code>.
</p>
<h3 id="f57599f8fcbb4b02af85816dff99a790">
Conclusion <a href="#f57599f8fcbb4b02af85816dff99a790">#</a>
</h3>
<p>
The catamorphism for a non-empty collection is a single function that produces a single value from the head and the tail of the collection. While it's possible to implement a 'standard fold' (<code>foldr</code> in Haskell), the non-empty catamorphism doesn't require a seed to get started. The data structure guarantees that there's always at least one value available, and this value can then be use to 'kick off' a fold.
</p>
<p>
In C# one can define the catamorphism as the above <code>Aggregate</code> method. You could then define all other instance functions based on <code>Aggregate</code>.
</p>
<p>
<strong>Next:</strong> <a href="/2019/06/03/either-catamorphism">Either catamorphism</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Test-driving the pyramid's top
https://blog.ploeh.dk/2023/07/31/test-driving-the-pyramids-top
2023-07-31T07:00:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Some thoughts on TDD related to integration and systems testing.</em>
</p>
<p>
My recent article <a href="/2023/07/17/works-on-most-machines">Works on most machines</a> elicited some responses. Upon reflection, it seems that most of the responses relate to the top of the <a href="https://martinfowler.com/bliki/TestPyramid.html">Test Pyramid</a>.
</p>
<p>
While I don't have an one-shot solution that addresses all concerns, I hope that nonetheless I can suggest some ideas and hopefully inspire a reader or two. That's all. I intend nothing of the following to be prescriptive. I describe my own professional experience: What has worked for me. Perhaps it could also work for you. Use the ideas if they inspire you. Ignore them if you find them impractical.
</p>
<h3 id="5031910a0b37420bbcea753fc9a31dd0">
The Test Pyramid <a href="#5031910a0b37420bbcea753fc9a31dd0">#</a>
</h3>
<p>
The Test Pyramid is often depicted like this:
</p>
<p>
<img src="/content/binary/standard-test-pyramid.png" alt="Standard Test Pyramid, which is really a triangle with three layers: Unit tests, integration tests, and UI tests.">
</p>
<p>
This seems to indicate that while the majority of tests should be unit tests, you should also have a substantial number of integration tests, and quite a few UI tests.
</p>
<p>
Perhaps the following is obvious, but the Test Pyramid is an idea; it's a way to communicate a concept in a compelling way. What one should take away from it, I think, is only this: The number of tests in each category should form a <a href="https://en.wikipedia.org/wiki/Total_order">total order</a>, where the <em>unit test</em> category is the maximum. In other words, you should have more unit tests than you have tests in the next category, and so on.
</p>
<p>
No-one says that you can only have three levels, or that they have to have the same height. Finally, the above figure isn't even a <a href="https://en.wikipedia.org/wiki/Pyramid_(geometry)">pyramid</a>, but rather a <a href="https://en.wikipedia.org/wiki/Triangle">triangle</a>.
</p>
<p>
I sometimes think of the Test Pyramid like this:
</p>
<p>
<img src="/content/binary/test-pyramid-perspective.png" alt="Test pyramid in perspective.">
</p>
<p>
To be honest, it's not so much whether or not the pyramid is shown in perspective, but rather that the <em>unit test</em> base is significantly more voluminous than the other levels, and that the top is quite small.
</p>
<h3 id="7b385ee20e7e4054afc625a15525285f">
Levels <a href="#7b385ee20e7e4054afc625a15525285f">#</a>
</h3>
<p>
In order to keep the above discussion as recognisable as possible, I've used the labels <em>unit tests</em>, <em>integration tests</em>, and <em>UI tests</em>. It's easy to get caught up in a discussion about how these terms are defined. Exactly what is a <em>unit test?</em> How does it differ from an <em>integration test?</em>
</p>
<p>
There's no universally accepted definition of a <em>unit test</em>, so it tends to be counter-productive to spend too much time debating the finer points of what to call the tests in each layer.
</p>
<p>
Instead, I find the following criteria useful:
</p>
<ol>
<li>In-process tests</li>
<li>Tests that involve more than one process</li>
<li>Tests that can only be performed in production</li>
</ol>
<p>
I'll describe each in a little more detail. Along the way, I'll address some of the reactions to <a href="/2023/07/17/works-on-most-machines">Works on most machines</a>.
</p>
<h3 id="7501caecf9d74c26aa15661fdc7982d7">
In-process tests <a href="#7501caecf9d74c26aa15661fdc7982d7">#</a>
</h3>
<p>
The <em>in-process</em> category corresponds roughly to the Test Pyramid's <em>unit test</em> level. It includes 'traditional' unit tests such as tests of stand-alone functions or methods on objects, but also <a href="/2012/06/27/FacadeTest">Facade Tests</a>. The latter may involve multiple modules or objects, perhaps even from multiple libraries. Many people may call them <em>integration tests</em> because they integrate more than one module.
</p>
<p>
As long as an automated test runs in a single process, in memory, it tends to be fast and leave no persistent state behind. This is almost exclusively the kind of test I tend to test-drive. I often follow an <a href="/outside-in-tdd">outside-in TDD</a> process, an example of which is shown in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
Consider an example from the source code that accompanies the book:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task ReserveTableAtTheVaticanCellar()
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> api = <span style="color:blue;">new</span> SelfHostedApi();
<span style="color:blue;">var</span> client = api.CreateClient();
<span style="color:blue;">var</span> timeOfDayLaterThanLastSeatingAtTheOtherRestaurants =
TimeSpan.FromHours(21.5);
<span style="color:blue;">var</span> at = DateTime.Today.AddDays(433).Add(
timeOfDayLaterThanLastSeatingAtTheOtherRestaurants);
<span style="color:blue;">var</span> dto = Some.Reservation.WithDate(at).ToDto();
<span style="color:blue;">var</span> response =
<span style="color:blue;">await</span> client.PostReservation(<span style="color:#a31515;">"The Vatican Cellar"</span>, dto);
response.EnsureSuccessStatusCode();
}</pre>
</p>
<p>
I think of a test like this as an automated acceptance test. It uses an internal test-specific domain-specific language (<a href="http://xunitpatterns.com/Test%20Utility%20Method.html">test utilities</a>) to exercise the REST service's API. It uses <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">ASP.NET self-hosting</a> to run both the service and the HTTP client in the same process.
</p>
<p>
Even though this may, at first glance, look like an integration test, it's an artefact of test-driven development. Since it does cut across both HTTP layer and domain model, some readers may think of it as an integration test. It uses a <a href="/2019/02/18/from-interaction-based-to-state-based-testing">stateful in-memory data store</a>, so it doesn't involve more than a single process.
</p>
<h3 id="2f036c4726d74c6285b1b0a759a54269">
Tests that span processes <a href="#2f036c4726d74c6285b1b0a759a54269">#</a>
</h3>
<p>
There are aspects of software that you can't easily drive with tests. I'll return to some really gnarly examples in the third category, but in between, we find concerns that are hard, but still possible to test. The reason that they are hard is often because they involve more than one process.
</p>
<p>
The most common example is data access. Many software systems save or retrieve data. With test-driven development, you're supposed to let the tests inform your API design decisions in such a way that everything that involves difficult, error-prone code is factored out of the data access layer, and into another part of the code that <em>can</em> be tested in process. This development technique ought to drain the hard-to-test components of logic, leaving behind a <a href="http://xunitpatterns.com/Humble%20Object.html">Humble Object</a>.
</p>
<p>
One reaction to <a href="/2023/07/17/works-on-most-machines">Works on most machines</a> concerns exactly that idea:
</p>
<blockquote>
<p>
"As a developer, you need to test HumbleObject's behavior."
</p>
<footer><cite><a href="https://twitter.com/ladeak87/status/1680915766764351489">ladeak</a></cite></footer>
</blockquote>
<p>
It's almost tautologically part of the definition of a Humble Object that you're <em>not</em> supposed to test it. Still, realistically, ladeak has a point.
</p>
<p>
When I wrote the example code to <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, I applied the Humble Object pattern to the data access component. For a good while, I had a <code>SqlReservationsRepository</code> class that was so simple, so drained of logic, that it couldn't possibly fail.
</p>
<p>
Until, of course, the inevitable happened: There was a bug in the <code>SqlReservationsRepository</code> code. Not to make a long story out of it, but even with a really low <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>, I'd accidentally swapped two database columns when reading from a table.
</p>
<p>
Whenever possible, when I discover a bug, I first write an automated test that exposes that bug, and only then do I fix the problem. This is congruent with <a href="/2023/01/23/agilean">my lean bias</a>. If a defect can occur once, it can occur again in the future, so it's better to have a regression test.
</p>
<p>
The problem with this bug is that it was in a Humble Object. So, ladeak is right. Sooner or later, you'll have to test the Humble Object, too.
</p>
<p>
That's when I had to bite the bullet and add a test library that tests against the database.
</p>
<p>
One such test looks like this:
</p>
<p>
<pre>[Theory]
[InlineData(Grandfather.Id, <span style="color:#a31515;">"2022-06-29 12:00"</span>, <span style="color:#a31515;">"e@example.gov"</span>, <span style="color:#a31515;">"Enigma"</span>, 1)]
[InlineData(Grandfather.Id, <span style="color:#a31515;">"2022-07-27 11:40"</span>, <span style="color:#a31515;">"c@example.com"</span>, <span style="color:#a31515;">"Carlie"</span>, 2)]
[InlineData(2, <span style="color:#a31515;">"2021-09-03 14:32"</span>, <span style="color:#a31515;">"bon@example.edu"</span>, <span style="color:#a31515;">"Jovi"</span>, 4)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task CreateAndReadRoundTrip(
<span style="color:blue;">int</span> restaurantId,
<span style="color:blue;">string</span> at,
<span style="color:blue;">string</span> email,
<span style="color:blue;">string</span> name,
<span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">var</span> expected = <span style="color:blue;">new</span> Reservation(
Guid.NewGuid(),
DateTime.Parse(at, CultureInfo.InvariantCulture),
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name),
quantity);
<span style="color:blue;">var</span> connectionString = ConnectionStrings.Reservations;
<span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> SqlReservationsRepository(connectionString);
<span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The entire test runs in a special context where a database is automatically created before the test runs, and torn down once the test has completed.
</p>
<blockquote>
<p>
"When building such behavior, you can test against a shared instance of the service in your dev team or run that service on your dev machine in a container."
</p>
<footer><cite><a href="https://twitter.com/ladeak87/status/1680915837782224905">ladeak</a></cite></footer>
</blockquote>
<p>
Yes, those are two options. A third, in the spirit of <a href="/ref/goos">GOOS</a>, is to strongly favour technologies that support automation. Believe it or not, you can automate <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a>. You don't need a Docker container for it. That's what I did in the above test.
</p>
<p>
I can see how a Docker container with an external dependency can be useful too, so I'm not trying to dismiss that technology. The point is, however, that simpler alternatives may exist. I, and others, did test-driven development for more than a decade before Docker existed.
</p>
<h3 id="4884dac317d84c70ac4824a5ee3fe922">
Tests that can only be performed in production <a href="#4884dac317d84c70ac4824a5ee3fe922">#</a>
</h3>
<p>
The last category of tests are those that you can only perform on a production system. What might be examples of that?
</p>
<p>
I've run into a few over the years. One such test is what I call a <a href="https://en.wikipedia.org/wiki/Smoke_testing_(software)">Smoke Test</a>: Metaphorically speaking, turn it on and see if it develops smoke. These kinds of tests are good at catching configuration errors. Does the web server have the right connection string to the database? A test can verify whether that's the case, but it makes no sense to run such a test on a development machine, or against a test system, or a staging environment. You want to verify that the production system is correctly configured. Only a test against the production system can do that.
</p>
<p>
For every configuration value, you may want to consider a Smoke Test.
</p>
<p>
There are other kinds of tests you can only perform in production. Sometimes, it's not technical concerns, but rather legal or financial constraints, that dictate circumstances.
</p>
<p>
A few years ago I worked with a software organisation that, among other things, integrated with the Danish <a href="https://en.wikipedia.org/wiki/Personal_identification_number_(Denmark)">personal identification number system (CPR)</a>. Things may have changed since, but back then, an organisation had to have a legal agreement with CPR before being granted access to its integration services. It's an old system (originally from 1968) with a proprietary data integration protocol.
</p>
<p>
We test-drove a parser of the data format, but that still left behind a Humble Object that would actually perform the data transfers. How do we test that Humble Object?
</p>
<p>
Back then, at least, there was no test system for the CPR service, and it was illegal to query the live system unless you had a business reason. And software testing did not constitute a legal reason.
</p>
<p>
The only <em>legal</em> option was to make the Humble Object as simple and foolproof as possible, and then observe how it worked in actual production situations. Containers wouldn't help in such a situation.
</p>
<p>
It's possible to write automated tests against production systems, but unless you're careful, they're difficult to write and maintain. At least, go easy on the assertions, since you can't assume much about the run-time data and behaviour of a live system. Smoke tests are mostly just 'pings', so can be written to be fairly maintenance-free, but you shouldn't need many of them.
</p>
<p>
Other kinds of tests against production are likely to be fragile, so it pays to minimise their number. That's the top of the pyramid.
</p>
<h3 id="32214fa81f2f4dc188b0990d4308bded">
User interfaces <a href="#32214fa81f2f4dc188b0990d4308bded">#</a>
</h3>
<p>
I no longer develop user interfaces, so take the following with a pinch of salt.
</p>
<p>
The 'original' Test Pyramid that I've depicted above has <em>UI tests</em> at the pyramid's top. That doesn't necessarily match the categories I've outlined here; don't assume parity.
</p>
<p>
A UI test may or may not involve more than one process, but they are often difficult to maintain for other reasons. Perhaps this is where the pyramid metaphor starts to break down. <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">All models are wrong, but some are useful</a>.
</p>
<p>
Back when I still programmed user interfaces, I'd usually test-drive them via a <a href="https://martinfowler.com/bliki/SubcutaneousTest.html">subcutaneous API</a>, and rely on some variation of <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">MVC</a> to keep the rendered controls in sync. Still, once in a while, you need to verify that the user interface looks as it's supposed to. Often, the best tool for that job is the good old Mark I Eyeball.
</p>
<p>
This still means that you need to run the application from time to time.
</p>
<blockquote>
<p>
"Docker is also very useful for enabling others to run your software on their machines. Recently, we've been exploring some apps that consisted of ~4 services (web servers) and a database. All of them written in different technologies (PHP, Java, C#). You don't have to setup environment variables. You don't need to have relevant SDKs to build projects etc. Just run docker command, and spin them instantly on your PC."
</p>
<footer><cite><a href="/2023/07/17/works-on-most-machines#4012c2cddcb64a068c0b06b7989a676e">qfilip</a></cite></footer>
</blockquote>
<p>
That sounds like a reasonable use case. I've never found myself in such circumstances, but I can imagine the utility that containers offer in a situation like that. Here's how I envision the scenario:
</p>
<p>
<img src="/content/binary/app-with-services-in-containers.png" alt="A box with arrows to three other boxes, which again have arrows to a database symbol.">
</p>
<p>
The boxes with rounded corners symbolise containers.
</p>
<p>
Again, my original goal with the <a href="/2023/07/17/works-on-most-machines">previous article</a> wasn't to convince you that container technologies are unequivocally bad. Rather, it was to suggest that test-driven development (TDD) solves many of the problems that people seem to think can only be solved with containers. Since TDD has many other beneficial side effects, it's worth considering instead of mindlessly reaching for containers, which may represent only a local maximum.
</p>
<p>
How could TDD address qfilip's concern?
</p>
<p>
When I test-drive software, I <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">favour real dependencies</a>, and I <a href="/2019/02/18/from-interaction-based-to-state-based-testing">favour Fake objects over Mocks and Stubs</a>. Were I to return to user-interface programming today, I'd define its external dependencies as one or more interfaces, and implement a <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Object</a> for each.
</p>
<p>
Not only will this enable me to simulate the external dependencies with the Fakes. If I implement the Fakes as part of the production code, I'd even be able to spin up the system, using the Fakes instead of the real system.
</p>
<p>
<img src="/content/binary/app-with-fake-dependencies.png" alt="App box with arrows pointing to itself.">
</p>
<p>
A Fake is an implementation that 'almost works'. A common example is an in-memory collection instead of a database. It's neither persistent nor thread-safe, but it's internally consistent. What you add, you can retrieve, until you delete it again. For the purposes of starting the app in order to verify that the user interface looks correct, that should be good enough.
</p>
<p>
Another related example is <a href="https://particular.net/nservicebus">NServiceBus</a>, which comes with a <a href="https://docs.particular.net/transports/learning/">file transport that is clearly labeled as not for production use</a>. While it's called the <em>Learning Transport</em>, it's also useful for exploratory testing on a development machine. While this example clearly makes use of an external resource (the file system), it illustrates how a Fake implementation can alleviate the need for a container.
</p>
<h3 id="687363d13cf24a569b1dea6a45f8771e">
Uses for containers <a href="#687363d13cf24a569b1dea6a45f8771e">#</a>
</h3>
<p>
Ultimately, it's still useful to be able to stand up an entire system, as qfilip suggests, and if containers is a good way to do that, it doesn't bother me. At the risk of sounding like a broken record, I never intended to say that containers are useless.
</p>
<p>
When I worked as a Software Development Engineer in Microsoft, I had two computers: A laptop and a rather beefy <a href="https://en.wikipedia.org/wiki/Computer_tower">tower PC</a>. I've always done all programming on laptops, so I repurposed the tower as a <a href="https://en.wikipedia.org/wiki/Microsoft_Virtual_Server">virtual server</a> with all my system's components on separate virtual machines (VM). The database in one VM, the application server in another, and so on. I no longer remember what all the components were, but I seem to recall that I had four VMs running on that one box.
</p>
<p>
While I didn't use it much, I found it valuable to occasionally verify that all components could talk to each other on a realistic network topology. This was in 2008, and <a href="https://en.wikipedia.org/wiki/Docker_(software)">Docker wasn't around then</a>, but I could imagine it would have made that task easier.
</p>
<p>
I don't dispute that Docker and <a href="https://en.wikipedia.org/wiki/Kubernetes">Kubernetes</a> are useful, but the job of a software architect is to carefully identify the technologies on which a system should be based. The more technology dependencies you take on, the more rigid the design.
</p>
<p>
After a few decades of programming, my experience is that as a programmer and architect, I can find better alternatives than depending on container technologies. If testers and IT operators find containers useful to do their jobs, then that's fine by me. Since my code <a href="/2023/07/17/works-on-most-machines">works on most machines</a>, it works in containers, too.
</p>
<h3 id="be02c64c095e474eaa54ab1750a2d471">
Truly Humble Objects <a href="#be02c64c095e474eaa54ab1750a2d471">#</a>
</h3>
<p>
One last response, and I'll wrap this up.
</p>
<blockquote>
<p>
"As a developer, you need to test HumbleObject's behavior. What if a DatabaseConnection or a TCP conn to a message queue is down?"
</p>
<footer><cite><a href="https://twitter.com/ladeak87/status/1680915766764351489">ladeak</a></cite></footer>
</blockquote>
<p>
How should such situations be handled? There may always be special cases, but in general, I can think of two reactions:
</p>
<ul>
<li>Log the error</li>
<li>Retry the operation</li>
</ul>
<p>
Assuming that the Humble Object is a polymorphic type (i.e. inherits a base class or implements an interface), you should be able to extract each of these behaviours to general-purpose components.
</p>
<p>
In order to log errors, you can either use a <a href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorator</a> or a global exception handler. Most frameworks provide a way to catch (otherwise) unhandled exceptions, exactly for this purpose, so you don't have to add such functionality to a Humble Object.
</p>
<p>
Retry logic can also be delegated to a third-party component. For .NET I'd start looking at <a href="https://www.thepollyproject.org/">Polly</a>, but I'd be surprised if other platforms don't have similar libraries that implement the stability patterns from <a href="/ref/release-it">Release It</a>.
</p>
<p>
Something more specialised, like a fail-over mechanism, sounds like a good reason to wheel out the <a href="https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern">Chain of Responsibility</a> pattern.
</p>
<p>
All of these can be tested independently of any Humble Object.
</p>
<h3 id="ea544f519e0b4114a21bb094d9798c6c">
Conclusion <a href="#ea544f519e0b4114a21bb094d9798c6c">#</a>
</h3>
<p>
In a recent article I reflected on my experience with TDD and speculated that a side effect of that process is code flexible enough to work on most machines. Thus, I've never encountered the need for a containers.
</p>
<p>
Readers responded with comments that struck me as mostly related to the upper levels of the Test Pyramid. In this article, I've attempted to address some of those concerns. I still get by without containers.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Is software getting worse?
https://blog.ploeh.dk/2023/07/24/is-software-getting-worse
2023-07-24T06:02:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A rant, with some examples.</em>
</p>
<p>
I've been a software user for thirty years.
</p>
<p>
My first PC was <a href="https://en.wikipedia.org/wiki/DOS">DOS</a>-based. In my first job, I used <a href="https://en.wikipedia.org/wiki/OS/2">OS/2</a>, in the next, <a href="https://en.wikipedia.org/wiki/Windows_3.1x">Windows 3.11</a>, <a href="https://en.wikipedia.org/wiki/Windows_NT">NT</a>, and later incarnations of Windows.
</p>
<p>
I wrote my first web sites in <a href="https://en.wikipedia.org/wiki/Arachnophilia">Arachnophilia</a>, and my first professional software in Visual Basic, Visual C++, and <a href="https://en.wikipedia.org/wiki/Visual_InterDev">Visual InterDev</a>.
</p>
<p>
I used <a href="https://en.wikipedia.org/wiki/Terminate_(software)">Terminate</a> with my first modem. If I recall correctly, it had a built-in email downloader and offline reader. Later, I switched to Outlook for email. I've used <a href="https://en.wikipedia.org/wiki/Netscape_Navigator">Netscape Navigator</a>, <a href="https://en.wikipedia.org/wiki/Internet_Explorer">Internet Explorer</a>, Firefox, and Chrome to surf the web.
</p>
<p>
I've written theses, articles, reports, etc. in <a href="https://en.wikipedia.org/wiki/WordPerfect">Word Perfect</a> for DOS and MS Word for Windows. I wrote my new book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits In your Head</a> in <a href="https://www.texstudio.org/">TexStudio</a>. Yes, it was written entirely in <a href="https://en.wikipedia.org/wiki/LaTeX">LaTeX</a>.
</p>
<h3 id="c3ee042288d94295bec33308840d075f">
Updates <a href="#c3ee042288d94295bec33308840d075f">#</a>
</h3>
<p>
For the first fifteen years, new software version were rare. You'd get a version of AutoCAD, Windows, or Visual C++, and you'd use it for years. After a few years, a new version would come out, and that would be a big deal.
</p>
<p>
Interim service releases were rare, too, since there was no network-based delivery mechanism. Software came on <a href="https://en.wikipedia.org/wiki/Floppy_disk">floppy disks</a>, and later on <a href="https://en.wikipedia.org/wiki/Compact_disc">CD</a>s.
</p>
<p>
Even if a bug fix was easy to make, it was difficult for a software vendor to <em>distribute</em> it, so most software releases were well-tested. Granted, software had bugs back then, and some of them you learned to work around.
</p>
<p>
When a new version came out, the same forces were at work. The new version had to be as solid and stable as the previous one. Again, I grant that once in a while, even in those days, this wasn't always the case. Usually, a bad release spelled the demise of a company, because release times were so long that competitors could take advantage of a bad software release.
</p>
<p>
Usually, however, software updates were <em>improvements</em>, and you looked forward to them.
</p>
<h3 id="865a6ab3b3c64982bbeb31ccf64db6b5">
Decay <a href="#865a6ab3b3c64982bbeb31ccf64db6b5">#</a>
</h3>
<p>
I no longer look forward to updates. These days, software is delivered over the internet, and some applications update automatically.
</p>
<p>
From a security perspective it can be a good idea to stay up-to-date, and for years, I diligently did that. Lately, however, I've become more conservative. Particularly when it comes to Windows, I ignore all suggestions to update it until it literally forces the update on me.
</p>
<p>
Just like <a href="https://tvtropes.org/pmwiki/pmwiki.php/Main/StarTrekMovieCurse">even-numbered Star Trek movies don't suck</a> the same pattern seems to be true for Windows: <a href="https://en.wikipedia.org/wiki/Windows_XP">Windows XP</a> was good, <a href="https://en.wikipedia.org/wiki/Windows_7">Windows 7</a> was good, and <a href="https://en.wikipedia.org/wiki/Windows_10">Windows 10</a> wasn't bad either. I kept putting off Windows 11 for as long as possible, but now I use it, and I can't say that I'm surprised that I don't like it.
</p>
<p>
This article, however, isn't a rant about Windows in particular. This seems to be a general trend, and it's been noticeable for years.
</p>
<h3 id="f3bd0d4e750b4d9aa53ee9a6c0f59ddc">
Examples <a href="#f3bd0d4e750b4d9aa53ee9a6c0f59ddc">#</a>
</h3>
<p>
I think that the first time I noticed a particular application degrading was <a href="https://en.wikipedia.org/wiki/Vivino">Vivino</a>. It started out as a local company here in Copenhagen, and I was a fairly early adopter. Initially, it was great: If you like wine, but don't know that much about it, you could photograph a bottle's label, and it'd automatically recognise the wine and register it in your 'wine library'. I found it useful that I could look up my notes about a wine I'd had a year ago to remind me what I thought of it. As time went on, however, I started to notice errors in my wine library. It might be double entries, or wines that were silently changed to another vintage, etc. Eventually it got so bad that I lost trust in the application and uninstalled it.
</p>
<p>
Another example is <a href="https://www.sublimetext.com/">Sublime Text</a>, which I used for writing articles for this blog. I even bought a licence for it. Version 3 was great, but version 4 was weird from the outset. One thing was that they changed how they indicated which parts of a file I'd edited after opening it, and I never understood the idea behind the visuals. Worse was that auto-closing of HTML stopped working. Since I'm that <a href="https://rakhim.org/honestly-undefined/19/">weird dude who writes raw HTML</a>, such a feature is quite important to me. If I write an HTML tag, I expect the editor to automatically add the closing tag, and place my cursor between the two. Sublime Text stopped doing that consistently, and eventually it became annoying enough that I though: <em>Why bother?</em> Now I write in <a href="https://code.visualstudio.com/">Visual Studio Code</a>.
</p>
<p>
Microsoft is almost a chapter in itself, but to be fair, I don't consider Microsoft products <em>worse</em> than others. There's just so much of it, and since I've always been working in the Microsoft tech stack, I use a lot of it. Thus, <a href="https://en.wikipedia.org/wiki/Selection_bias">selection bias</a> clearly is at work here. Still, while I don't think Microsoft is worse than the competition, it seems to be part of the trend.
</p>
<p>
For years, my login screen was stuck on the same mountain lake, even though I tried every remedy suggested on the internet. Eventually, however, a new version of Windows fixed the issue. So, granted, sometimes new versions improve things.
</p>
<p>
Now, however, I have another problem with <a href="https://en.wikipedia.org/wiki/Windows_spotlight">Windows Spotlight</a>. It shows nice pictures, and there used to be an option to see where the picture was taken. Since I repaved my machine, this option is gone. Again, I've scoured the internet for resolutions to this problem, but neither rebooting, regedit changes, etc. has so far solved the problem.
</p>
<p>
That sounds like small problems, so let's consider something more serious. Half a year ago, Outlook used to be able to detect whether I was writing an email in English or Danish. It could even handle the hybrid scenario where parts of an email was in English, and parts in Danish. Since I repaved my machine, this feature no longer works. Outlook doesn't recognise Danish when I write it. One thing are the red squiggly lines under most words, but that's not even the worst. The worst part of this is that even though I'm writing in Danish, outlook thinks I'm writing in English, so it silently auto-corrects Danish words to whatever looks adjacent in English.
</p>
<p>
<img src="/content/binary/outlook-language-bug.png" alt="Screen shot of Outlook language bug.">
</p>
<p>
This became so annoying that I contacted Microsoft support about it, but while they had me try a number of things, nothing worked. They eventually had to give up and suggested that I reinstalled my machine - which, at that point, I'd done two weeks before.
</p>
<p>
This used to work, but now it doesn't.
</p>
<h3 id="b314b021b97f455184e44c52ab584afa">
It's not all bad <a href="#b314b021b97f455184e44c52ab584afa">#</a>
</h3>
<p>
I could go on with other examples, but I think that this suffices. After all, I don't think it makes for a compelling read.
</p>
<p>
Of course, not everything is bad. While it looks as though I'm particularly harping on Microsoft, I rarely detect problems with <a href="https://visualstudio.microsoft.com/">Visual Studio</a> or Code, and I usually install updates as soon as they are available. The same is true for much other software I use. <a href="https://www.getpaint.net/">Paint.NET</a> is awesome, <a href="https://www.getmusicbee.com/">MusicBee</a> is solid, and even the <a href="https://www.sonos.com/">Sonos</a> Windows app, while horrific, is at least consistently so.
</p>
<h3 id="708ec0a2be1d42f083c1c2d37c24221d">
Conclusion <a href="#708ec0a2be1d42f083c1c2d37c24221d">#</a>
</h3>
<p>
It seems to me that some software is actually getting worse, and that this is a more recent trend.
</p>
<p>
The point isn't that some software is bad. This has always been the case. What seems new to me is that software that <em>used to be good</em> deteriorates. While this wasn't entirely unheard of in the nineties (I'm looking at you, WordPerfect), this is becoming much more noticeable.
</p>
<p>
Perhaps it's just <a href="https://en.wikipedia.org/wiki/Frequency_illusion">frequency illusion</a>, or perhaps it's because I use software much more than I did in the nineties. Still, I can't shake the feeling that some software is deteriorating.
</p>
<p>
Why does this happen? I don't know, but my own bias suggests that it's because there's less focus on regression testing. Many of the problems I see look like regression bugs to me. A good engineering team could have caught them with automated regression tests, but these days, it seems as though many teams rely on releasing often and then letting users do the testing.
</p>
<p>
The problem with that approach, however, is that if you don't have good automated tests, fixing one regression may resurrect another.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="16748f443dbe46f6b9f039e3de5165bf">
<div class="comment-author"><a href="https://carlosschults.net">Carlos Schults</a> <a href="#16748f443dbe46f6b9f039e3de5165bf">#</a></div>
<div class="comment-content">
<p>I don't think your perception that software is getting worse is wrong, Mark. I've been an Evernote user since 2011. And, for a good portion of those years, I've been a paid customer.</p>
<p>
I'm certainly not alone in my perception that the application is becoming worse with each new release, to the point of, at times, becoming
unusable. Syncing problems with the mobile version, weird changes in the UI that don't accomplish anything, general sluggishness, and, above all,
not listening to the users regarding long-needed features.
</p>
</div>
<div class="comment-date">2023-07-29 18:05 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Works on most machines
https://blog.ploeh.dk/2023/07/17/works-on-most-machines
2023-07-17T08:01:00+00:00
Mark Seemann
<div id="post">
<p>
<em>TDD encourages deployment flexibility. Functional programming also helps.</em>
</p>
<p>
Recently several of the podcasts I subscribe to have had episodes about various container technologies, of which <a href="https://en.wikipedia.org/wiki/Kubernetes">Kubernetes</a> dominates. I tune out of such content, since it has nothing to do with me.
</p>
<p>
I've never found containerisation relevant. I remember being fascinated when I first heard of <a href="https://en.wikipedia.org/wiki/Docker_(software)">Docker</a>, and for a while, I awaited a reason to use it. It never materialised.
</p>
<p>
I'd test-drive whatever system I was working on, and deploy it to production. Usually, it'd just work.
</p>
<p>
Since my process already produced good results, why make it more complicated?
</p>
<p>
Occasionally, I would become briefly aware of the lack of containers in my life, but then I'd forget about it again. Until now, I haven't thought much about it, and it's probably only the random coincidence of a few podcast episodes back-to-back that made me think more about it.
</p>
<h3 id="98b5a360a5ba413ca4dbccce86cbe331">
Be liberal with what system you run on <a href="#98b5a360a5ba413ca4dbccce86cbe331">#</a>
</h3>
<p>
When I was a beginner programmer a few years ago, things were different. I'd write code that <a href="https://blog.codinghorror.com/the-works-on-my-machine-certification-program/">worked on my machine</a>, but not always on the test server.
</p>
<p>
As I gained experience, this tended to happen less often. This doubtlessly have multiple causes, and increased experience is likely one of them, but I also think that my interest in loose coupling and test-driven development plays a role.
</p>
<p>
Increasingly I developed an ethos of writing software that would work on most machines, instead of only my own. It seems reminiscent of <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>: Be liberal with what system you run on.
</p>
<p>
Test-driven development helps in that regard, because you write code that must be able to execute in at least two contexts: The test context, and the actual system context. These two contexts both exist on your machine.
</p>
<p>
A colleague once taught me: <em>The most difficult generalisation step is going from one to two</em>. Once you've generalised to two cases, it's much easier to generalise to three, four, or <em>n</em> cases.
</p>
<p>
It seems to me that such from-one-to-two-cases generalisation is an inadvertent by-product of test-driven development. Once your code already matches two different contexts, making it even more flexible isn't that much extra work. It's not even <a href="https://wiki.c2.com/?SpeculativeGenerality">speculative generality</a> because you also need to make it work on the production system and (one hopes) on a build server or continuous delivery pipeline. That's 3-4 contexts. Odds are that software that runs successfully in four separate contexts runs successfully on many more systems.
</p>
<h3 id="a0315ee333ff454fb1bd6814f5806121">
General-purpose modules <a href="#a0315ee333ff454fb1bd6814f5806121">#</a>
</h3>
<p>
In <a href="/ref/a-philosophy-of-software-design">A Philosophy of Software Design</a> John Ousterhout argues that one should aim for designing general-purpose objects or modules, rather than specialised APIs. He calls them <em>deep modules</em> and their counterparts <em>shallow modules</em>. On the surface, this seems to go against the grain of <a href="https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">YAGNI</a>, but the way I understand the book, the point is rather that general-purpose solutions also solve special cases, and, when done right, the code doesn't have to be more complicated than the one that handles the special case.
</p>
<p>
As I write in <a href="https://www.goodreads.com/review/show/5498011140">my review of the book</a>, I think that there's a connection with test-driven development. General-purpose code is code that works in more than one situation, including automated testing environments. This is almost tautological. If it doesn't work in an automated test, an argument could be made that it's insufficiently general.
</p>
<p>
Likewise, general-purpose software should be able to work when deployed to more than one machine. It should even work on machines where other versions of that software already exist.
</p>
<p>
When you have general-purpose software, though, do you really need containers?
</p>
<h3 id="960f22bd15fb48b1a3d7dfddc9f60408">
Isolation <a href="#960f22bd15fb48b1a3d7dfddc9f60408">#</a>
</h3>
<p>
While I've routinely made use of test-driven development since 2003, I started my shift towards functional programming around ten years later. I think that this has amplified my code's flexibility.
</p>
<p>
As <a href="https://jessitron.com/">Jessica Kerr</a> <a href="http://www.functionalgeekery.com/episode-8-jessica-kerr">pointed out years ago</a>, a corollary of <a href="https://en.wikipedia.org/wiki/Referential_transparency">referential transparency</a> is that <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> are <em>isolated</em> from their environment. Only input arguments affect the output of a pure function.
</p>
<p>
Ultimately, you may need to query the environment about various things, but in functional programming, querying the environment is impure, so you <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">push it to the boundary of the system</a>. Functional programming encourages you to <a href="/2018/11/19/functional-architecture-a-definition">explicitly consider and separate impure actions from pure functions</a>. This implies that the environment-specific code is small, cohesive, and easy to review.
</p>
<h3 id="4775b8ab484c4833a6a5a86bac3b8b8e">
Conclusion <a href="#4775b8ab484c4833a6a5a86bac3b8b8e">#</a>
</h3>
<p>
For a while, when Docker was new, I expected it to be a technology that I'd eventually pick up and make part of my tool belt. As the years went by, that never happened. As a programmer, I've never had the need.
</p>
<p>
I think that a major contributor to that is that since I mostly develop software with test-driven development, the resulting software is already robust or flexible enough to run in multiple environments. Adding functional programming to the mix helps to achieve isolation from the run-time environment.
</p>
<p>
All of this seems to collaborate to enable code to work not just on my machine, but on most machines. Including containers.
</p>
<p>
Perhaps there are other reasons to use containers and Kubernetes. In a devops context, I could imagine that it makes deployment and operations easier. I don't know much about that, but I also don't mind. If someone wants to take the code I've written and run it in a container, that's fine. It's going to run there too.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="4012c2cddcb64a068c0b06b7989a676e">
<div class="comment-author">qfilip <a href="#4012c2cddcb64a068c0b06b7989a676e">#</a></div>
<div class="comment-content">
<p>
Commenting for the first time. I hope I made these changes in proper manner. Anyway...
</p>
<p>
Kubernetes usually also means the usage of cloud infrastructure, and as such, it can be automated (and change-tracked)
in various interesting ways. Is it worth it? Well, that depends as always... Docker isn't the only container technology supported by
k8s, but since it's the most popular one... they go hand in hand.
</p>
<p>
Docker is also very useful for enabling others to run your software on their machines. Recently,
we've been exploring some apps that consisted of ~4 services (web servers) and a database. All of them written
in different technologies (PHP, Java, C#). You don't have to setup environment variables. You don't need to have relevant SDKs
to build projects etc. Just run docker command, and spin them instantly on your PC.
</p>
<p>So there's that...</p>
<p>
Unrelated to the topic above, I'd like to ask you, if you could write an article on the specific subject. Or, if
the answer is short, comment me back. As an F# enthusiast, I find yours and <a href="https://fsharpforfunandprofit.com">Scott's</a>
blog very valuable. One thing I've failed to find here is why you don't like ORMs. I think the words were
<i>they solve a problem that we shouldn't have in the first place</i>. Since F# doesn't play too well with
Entity Framework, and I pretty much can't live without it... I'm curious if I'm missing something.
A different approach, way of thinking. I can work with raw SQL ofcourse... but the mapping... oh the mapping...
</p>
</div>
<div class="comment-date">2023-07-18 22:30 UTC</div>
</div>
<div class="comment" id="04d9d2a2e9884b0ba2a0049898b98e5f">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#04d9d2a2e9884b0ba2a0049898b98e5f">#</a></div>
<div class="comment-content">
<p>
I'm contemplating turning my response into a new article, but it may take some time before I get to it. I'll post here once I have a more thorough response.
</p>
</div>
<div class="comment-date">2023-07-23 13:56 UTC</div>
</div>
<div class="comment" id="2adbe12cf0e541e7b76cf39037c6a96c">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#2adbe12cf0e541e7b76cf39037c6a96c">#</a></div>
<div class="comment-content">
<p>
qfilip, thank you for writing. I've now published <a href="/2023/07/31/test-driving-the-pyramids-top">the article</a> that, among many other things, respond to your comment about containers.
</p>
<p>
I'll get back to your question about <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">ORMs</a> as soon as possible.
</p>
</div>
<div class="comment-date">2023-07-31 07:01 UTC</div>
</div>
<div class="comment" id="1702d8d84d024d3b83b683e2589460f5">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#1702d8d84d024d3b83b683e2589460f5">#</a></div>
<div class="comment-content">
<p>
I'm still considering how to best address the question about ORMs, but in the meanwhile, I'd like to point interested readers to Ted Neward's famous article <a href="https://blogs.newardassociates.com/blog/2006/the-vietnam-of-computer-science.html">The Vietnam of Computer Science</a>.
</p>
</div>
<div class="comment-date">2023-08-14 20:01 UTC</div>
</div>
<div class="comment" id="74654595140446239d66bcb85fc51234">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#74654595140446239d66bcb85fc51234">#</a></div>
<div class="comment-content">
<p>
Finally, I'm happy to announce that I've written an article trying to explain my position: <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">Do ORMs reduce the need for mapping?</a>.
</p>
</div>
<div class="comment-date">2023-09-18 14:50 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
AI for doc comments
https://blog.ploeh.dk/2023/07/10/ai-for-doc-comments
2023-07-10T06:02:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A solution in search of a problem?</em>
</p>
<p>
I was recently listening to <a href="https://www.dotnetrocks.com/details/1850">a podcast episode</a> where the guest (among other things) enthused about how advances in <a href="https://en.wikipedia.org/wiki/Large_language_model">large language models</a> mean that you can now get these systems to write <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/xmldoc/">XML doc comments</a>.
</p>
<p>
You know, these things:
</p>
<p>
<pre><span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> Scorbles a dybliad.</span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"></</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">param</span> <span style="color:gray;">name</span><span style="color:gray;">=</span><span style="color:gray;">"</span>dybliad<span style="color:gray;">"</span><span style="color:gray;">></span><span style="color:green;">The dybliad to scorble.</span><span style="color:gray;"></</span><span style="color:gray;">param</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">param</span> <span style="color:gray;">name</span><span style="color:gray;">=</span><span style="color:gray;">"</span>flag<span style="color:gray;">"</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> A flag that controls wether scorbling is done pre- or postvotraid.</span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"></</span><span style="color:gray;">param</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">returns</span><span style="color:gray;">></span><span style="color:green;">The scorbled dybliad.</span><span style="color:gray;"></</span><span style="color:gray;">returns</span><span style="color:gray;">></span>
<span style="color:blue;">public</span> <span style="color:blue;">string</span> Scorble(<span style="color:blue;">string</span> dybliad, <span style="color:blue;">bool</span> flag)</pre>
</p>
<p>
And it struck me how that's not the first time I've encountered that notion. Finally, you no longer need to write those tedious documentation comments in your code. Instead, you can get <a href="https://github.com/features/copilot">Github Copilot</a> or <a href="https://en.wikipedia.org/wiki/ChatGPT">ChatGPT</a> to write them for you.
</p>
<p>
When was the last time you wrote such comments?
</p>
<p>
I'm sure that there are readers who wrote some just yesterday, but generally, I rarely encounter them in the wild.
</p>
<p>
As a rule, I only write them when my modelling skills fail me so badly that I need to <a href="http://butunclebob.com/ArticleS.TimOttinger.ApologizeIncode">apologise in code</a>. Whenever I run into such a situation, I may as well take advantage of the format already in place for such things, but it's not taking up a big chunk of my time.
</p>
<p>
It's been a decade since I ran into a code base where doc comments were mandatory. When I had to write comments, I'd use <a href="https://submain.com/ghostdoc/">GhostDoc</a>, which used heuristics to produce 'documentation' on par with modern AI tools.
</p>
<p>
Whether you use GhostDoc, Github Copilot, or write the comments yourself, most of them tend to be equally inane and vacuous. Good design only amplifies this quality. The better names you use, and the more you leverage the type system to <a href="https://blog.janestreet.com/effective-ml-video">make illegal states unrepresentable</a>, the less you need the kind of documentation furnished by doc comments.
</p>
<p>
I find it striking that more than one person wax poetic about AI's ability to produce doc comments.
</p>
<p>
Is that, ultimately, the only thing we'll entrust to large language models?
</p>
<p>
I <a href="/2022/12/05/github-copilot-preliminary-experience-report">know that that they can do more than that</a>, but are we going to let them? Or is automatic doc comments a solution in search of a problem?
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Validating or verifying emails
https://blog.ploeh.dk/2023/07/03/validating-or-verifying-emails
2023-07-03T05:41:00+00:00
Mark Seemann
<div id="post">
<p>
<em>On separating preconditions from business rules.</em>
</p>
<p>
My recent article <a href="/2023/06/26/validation-and-business-rules">Validation and business rules</a> elicited this question:
</p>
<blockquote>
<p>
"Regarding validation should be pure function, lets have user registration as an example, is checking the email address uniqueness a validation or a business rule? It may not be pure since the check involves persistence mechanism."
</p>
<footer><cite><a href="https://twitter.com/Cherif_b/status/1673906245172969473">Cherif BOUCHELAGHEM</a></cite></footer>
</blockquote>
<p>
This is a great opportunity to examine some heuristics in greater detail. As always, this mostly presents how I think about problems like this, and so doesn't represent any rigid universal truth.
</p>
<p>
The specific question is easily answered, but when the topic is email addresses and validation, I foresee several follow-up questions that I also find interesting.
</p>
<h3 id="579eacb264e94f3bb80aa6d5020df26f">
Uniqueness constraint <a href="#579eacb264e94f3bb80aa6d5020df26f">#</a>
</h3>
<p>
A new user signs up for a system, and as part of the registration process, you want to verify that the email address is unique. Is that validation or a business rule?
</p>
<p>
Again, I'm going to put the cart before the horse and first use the definition to answer the question.
</p>
<blockquote>
<p>
Validation is a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a> that decides whether data is acceptable.
</p>
<footer><cite><a href="/2023/06/26/validation-and-business-rules">Validation and business rules</a></cite></footer>
</blockquote>
<p>
Can you implement the uniqueness constraint with a pure function? Not easily. What most systems would do, I'm assuming, is to keep track of users in some sort of data store. This means that in order to check whether or not a email address is unique, you'd have to query that database.
</p>
<p>
Querying a database is non-deterministic because you could be making multiple subsequent queries with the same input, yet receive differing responses. In this particular example, imagine that you ask the database whether <em>ann.siebel@example.com</em> is already registered, and the answer is <em>no, that address is new to us</em>.
</p>
<p>
Database queries are snapshots in time. All that answer tells you is that at the time of the query, the address would be unique in your database. While that answer travels over the network back to your code, a concurrent process might add that very address to the database. Thus, the next time you ask the same question: <em>Is ann.siebel@example.com already registered?</em> the answer would be: <em>Yes, we already know of that address</em>.
</p>
<p>
Verifying that the address is unique (most likely) involves an impure action, and so according to the above definition isn't a validation step. By the <a href="https://en.wikipedia.org/wiki/Law_of_excluded_middle">law of the the excluded middle</a>, then, it must be a business rule.
</p>
<p>
Using a different rule of thumb, <a href="https://en.wikipedia.org/wiki/Robert_C._Martin">Robert C. Martin</a> arrives at the same conclusion:
</p>
<blockquote>
<p>
"Uniqueness is semantic not syntactic, so I vote that uniqueness is a business rule not a validation rule."
</p>
<footer><cite><a href="https://twitter.com/unclebobmartin/status/1674023070611263493">Robert C. Martin</a></cite></footer>
</blockquote>
<p>
This highlights a point about this kind of analysis. Using functional purity is a heuristic shortcut to sorting verification problems. Those that are deterministic and have no side effects are validation problems, and those that are either non-deterministic or have side effects are not.
</p>
<p>
Being able to sort problems in this way is useful because it enables you to choose the right tool for the job, and to avoid the wrong tool. In this case, trying to address the uniqueness constraint with validation is likely to cause trouble.
</p>
<p>
Why is that? Because of what I already described. A database query is a snapshot in time. If you make a decision based on that snapshot, it may be the wrong decision once you reach a conclusion. Granted, when discussing user registration, the risk of several processes concurrently trying to register the same email address probably isn't that big, but in other domains, contention may be a substantial problem.
</p>
<p>
Being able to identify a uniqueness constraint as something that <em>isn't</em> validation enables you to avoid that kind of attempted solution. Instead, you may contemplate other designs. If you keep users in a relational database, the easiest solution is to put a uniqueness constraint on the <code>Email</code> column and let the database deal with the problem. Just be prepared to handle the exception that the <code>INSERT</code> statement may generate.
</p>
<p>
If you have another kind of data store, there are other ways to model the constraint. You can even do so using lock-free architectures, but that's out of scope for this article.
</p>
<h3 id="4ebc57fb6a4d40d4a168b454f63804fb">
Validation checks preconditions <a href="#4ebc57fb6a4d40d4a168b454f63804fb">#</a>
</h3>
<p>
<a href="/encapsulation-and-solid">Encapsulation</a> is an important part of object-oriented programming (<a href="/2022/10/24/encapsulation-in-functional-programming">and functional programming as well</a>). As I've often outlined, I base my understanding of encapsulation on <a href="/ref/oosc">Object-Oriented Software Construction</a>. I consider <em>contract</em> (preconditions, invariants, and postconditions) essential to encapsulation.
</p>
<p>
I'll borrow a figure from my article <a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a>:
</p>
<p>
<img src="/content/binary/validation-as-a-function-from-data-to-type.png" alt="An arrow labelled 'validation' pointing from a document to the left labelled 'Data' to a box to the right labelled 'Type'.">
</p>
<p>
The role of validation is to answer the question: <em>Does the data make sense?</em>
</p>
<p>
This question, and its answer, is typically context-dependent. What 'makes sense' means may differ. This is even true for email addresses.
</p>
<p>
When I wrote the example code for my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I had to contemplate how to model email addresses. Here's an excerpt from the book:
</p>
<blockquote>
<p>
Email addresses are <a href="https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/">notoriously difficult to validate</a>, and even if you had a full implementation of the SMTP specification, what good would it do you?
</p>
<p>
Users can easily give you a bogus email address that fits the spec. The only way to really validate an email address is to send a message to it and see if that provokes a response (such as the user clicking on a validation link). That would be a long-running asynchronous process, so even if you'd want to do that, you can't do it as a blocking method call.
</p>
<p>
The bottom line is that it makes little sense to validate the email address, apart from checking that it isn't null. For that reason, I'm not going to validate it more than I've already done.
</p>
<footer><cite><a href="/ctfiyh">Code That Fits in Your Head</a>, p. 102</cite></footer>
</blockquote>
<p>
In this example, I decided that the only precondition I would need to demand was that the email address isn't null. This was motivated by the operations I needed to perform with the email address - or rather, in this case, the operations I didn't need to perform. The only thing I needed to do with the address was to save it in a database and send emails:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task EmailReservationCreated(<span style="color:blue;">int</span> restaurantId, Reservation reservation)
{
<span style="color:blue;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(reservation));
<span style="color:blue;">var</span> r = <span style="color:blue;">await</span> RestaurantDatabase.GetRestaurant(restaurantId).ConfigureAwait(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> subject = <span style="color:#a31515;">$"Your reservation for </span>{r?.Name}<span style="color:#a31515;">."</span>;
<span style="color:blue;">var</span> body = CreateBodyForCreated(reservation);
<span style="color:blue;">var</span> email = reservation.Email.ToString();
<span style="color:blue;">await</span> Send(subject, body, email).ConfigureAwait(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
This code example suggests why I made it a precondition that <code>Email</code> mustn't be null. Had null be allowed, I would have had to resort to <a href="/2013/07/08/defensive-coding">defensive coding, which is exactly what encapsulation makes redundant</a>.
</p>
<p>
Validation is a process that determines whether data is useful in a particular context. In this particular case, all it takes is to check the <code>Email</code> property on the <a href="https://en.wikipedia.org/wiki/Data_transfer_object">DTO</a>. The sample code that comes with <a href="/ctfiyh">Code That Fits in Your Head</a> shows the basics, while <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a> contains a more advanced solution.
</p>
<h3 id="55a584c6f15d4899beac8e40190068d5">
Preconditions are context-dependent <a href="#55a584c6f15d4899beac8e40190068d5">#</a>
</h3>
<p>
I would assume that a normal user registration process has little need to validate an ostensible email address. A system may want to verify the address, but that's a completely different problem. It usually involves sending an email to the address in question and have some asynchronous process register if the user verifies that email. For an article related to this problem, see <a href="/2019/12/02/refactoring-registration-flow-to-functional-architecture">Refactoring registration flow to functional architecture</a>.
</p>
<p>
Perhaps you've been reading this with mounting frustration: <em>How about validating the address according to the SMTP spec?</em>
</p>
<p>
Indeed, that sounds like something one should do, but turns out to be rarely necessary. As already outlined, users can easily supply a bogus address like <code>foo@bar.com</code>. It's valid according to the spec, and so what? How does that information help you?
</p>
<p>
In most contexts I've found myself, validating according to the SMTP specification is a distraction. One might, however, imagine scenarios where it might be required. If, for example, you need to sort addresses according to user name or host name, or perform some filtering on those parts, etc. it might be warranted to actually <em>require</em> that the address is valid.
</p>
<p>
This would imply a <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">validation step that attempts to parse</a> the address. Once again, parsing here implies translating less-structured data (a string) to more-structured data. On .NET, I'd consider using the <a href="https://learn.microsoft.com/dotnet/api/system.net.mail.mailaddress">MailAddress</a> class which already comes with <a href="https://learn.microsoft.com/dotnet/api/system.net.mail.mailaddress.trycreate">built-in parser functions</a>.
</p>
<p>
The point being that your needs determine your preconditions, which again determine what validation should do. The preconditions are context-dependent, and so is validation.
</p>
<h3 id="6e66e8eb91f742109a02def661115656">
Conclusion <a href="#6e66e8eb91f742109a02def661115656">#</a>
</h3>
<p>
Email addresses offer a welcome opportunity to discuss the difference between validation and verification in a way that is specific, but still, I hope, easy to extrapolate from.
</p>
<p>
Validation is a translation from one (less-structured) data format to another. Typically, the more-structured data format is an object, a record, or a hash map (depending on language). Thus, validation is determined by two forces: What the input data looks like, and what the desired object requires; that is, its preconditions.
</p>
<p>
Validation is always a translation with the potential for error. Some input, being less-structured, can't be represented by the more-structured format. In addition to parsing, a validation function must also be able to fail in a composable matter. That is, fortunately, <a href="/2020/12/14/validation-a-solved-problem">a solved problem</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Validation and business rules
https://blog.ploeh.dk/2023/06/26/validation-and-business-rules
2023-06-26T06:05:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A definition of validation as distinguished from business rules.</em>
</p>
<p>
This article suggests a definition of <em>validation</em> in software development. <em>A</em> definition, not <em>the</em> definition. It presents how I currently distinguish between validation and business rules. I find the distinction useful, although perhaps it's a case of reversed causality. The following definition of <em>validation</em> is useful because, if defined like that, <a href="/2020/12/14/validation-a-solved-problem">it's a solved problem</a>.
</p>
<p>
My definition is this:
</p>
<p>
<em>Validation is a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a> that decides whether data is acceptable.</em>
</p>
<p>
I've used the word <em>acceptable</em> because it suggests a link to <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>. When validating, you may want to allow for some flexibility in input, even if, strictly speaking, it's not entirely on spec.
</p>
<p>
That's not, however, the key ingredient in my definition. The key is that validation should be a pure function.
</p>
<p>
While this may sound like an arbitrary requirement, there's a method to my madness.
</p>
<h3 id="8b91bccdf17f42fa9cfc93599e35bd6c">
Business rules <a href="#8b91bccdf17f42fa9cfc93599e35bd6c">#</a>
</h3>
<p>
Before I explain the benefits of the above definition, I think it'll be useful to outline typical problems that developers face. My thesis in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> is that understanding limits of human cognition is a major factor in making a code base sustainable. This again explains why <a href="/encapsulation-and-solid">encapsulation</a> is such an important idea. You want to <em>confine</em> knowledge in small containers that fit in your head. Information shouldn't leak out of these containers, because that would require you to keep track of too much stuff when you try to understand other code.
</p>
<p>
When discussing encapsulation, I emphasise <em>contract</em> over information hiding. A contract, in the spirit of <a href="/ref/oosc">Object-Oriented Software Construction</a>, is a set of preconditions, invariants, and postconditions. Preconditions are particularly relevant to the topic of validation, but I've often experienced that some developers struggle to identify where validation ends and business rules begin.
</p>
<p>
Consider an online restaurant reservation system as an example. We'd like to implement a feature that enables users to make reservations. In order to meet that end, we decide to introduce a <code>Reservation</code> class. What are the preconditions for creating a valid instance of such a class?
</p>
<p>
When I go through such an exercise, people quickly identify requirement such as these:
</p>
<ul>
<li>The reservation should have a date and time.</li>
<li>The reservation should contain the number of guests.</li>
<li>The reservation should contain the name or email (or other data) about the person making the reservation.</li>
</ul>
<p>
A common suggestion is that the restaurant should also be able to accommodate the reservation; that is, it shouldn't be fully booked, it should have an available table at the desired time of an appropriate size, etc.
</p>
<p>
That, however, isn't a precondition for creating a valid <code>Reservation</code> object. That's a business rule.
</p>
<h3 id="5c9983d374e84212bbd37e7cd2476287">
Preconditions are self-contained <a href="#5c9983d374e84212bbd37e7cd2476287">#</a>
</h3>
<p>
How do you distinguish between a precondition and a business rule? And what does that have to do with input validation?
</p>
<p>
Notice that in the above examples, the three preconditions I've listed are self-contained. They are statements about the object or value's constituent parts. On the other hand, the requirement that the restaurant should be able to accommodate the reservation deals with a wider context: The table layout of the restaurant, prior reservations, opening and closing times, and other business rules as well.
</p>
<p>
Validation is, as <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">Alexis King points out</a>, a parsing problem. You receive less-structured data (<a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>, <a href="https://en.wikipedia.org/wiki/JSON">JSON</a>, <a href="https://en.wikipedia.org/wiki/XML">XML</a>, etc.) and attempt to project it to a more-structured format (C# objects, <a href="https://fsharp.org/">F#</a> records, <a href="https://clojure.org/">Clojure</a> maps, etc.). This succeeds when the input satisfies the preconditions, and fails otherwise.
</p>
<p>
Why can't we add more preconditions than required? Consider <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>. An operation (and that includes object constructors) should be liberal in what it accepts. While you have to draw the line somewhere (you can't really work with a reservation if the date is missing), an object shouldn't require <em>more</em> than it needs.
</p>
<p>
In general we observe that the fewer pre-conditions, the easier it is to create an object (or equivalent functional data structure). As a counter-example, this explains why <a href="https://en.wikipedia.org/wiki/Active_record_pattern">Active Record</a> is antithetical to unit testing. One precondition is that there's a database available, and while not impossible to automate in tests, it's quite the hassle. It's easier to work with <a href="https://en.wikipedia.org/wiki/Plain_old_Java_object">POJOs</a> in tests. And unit tests, being <a href="/2011/11/10/TDDimprovesreusability">the first clients of an API</a>, tell you how easy it is to use that API.
</p>
<h3 id="56b5624f416c4648aab2b27216c10bb1">
Contracts with third parties <a href="#56b5624f416c4648aab2b27216c10bb1">#</a>
</h3>
<p>
If validation is fundamentally parsing, it seems reasonable that operations should be pure functions. After all, a parser operates on unchanging (less-structured) data. A programming-language parser takes contents of text files as input. There's little need for more input than that, and the output is expected to be deterministic. Not surprisingly, <a href="https://www.haskell.org/">Haskell</a> is well-suited for writing parsers.
</p>
<p>
You don't, however, have to buy the argument that validation is essentially parsing, so consider another perspective.
</p>
<p>
Validation is a data transformation step you perform to deal with input. Data comes from a source external to your system. It can be a user filling in a form, another program making an HTTP request, or a batch job that receives files over <a href="https://en.wikipedia.org/wiki/File_Transfer_Protocol">FTP</a>.
</p>
<p>
Even if you don't have a formal agreement with any third party, <a href="https://www.hyrumslaw.com/">Hyrum's law</a> implies that a contract does exist. It behoves you to pay attention to that, and make it as explicit as possible.
</p>
<p>
Such a contract should be stable. Third parties should be able to rely on deterministic behaviour. If they supply data one day, and you accept it, you can't reject the same data the next days on grounds that it was malformed. At best, you may <a href="/2021/12/13/backwards-compatibility-as-a-profunctor">be contravariant in input as time passes</a>; in other words, you may accept things tomorrow that you didn't accept today, but you may not reject tomorrow what you accepted today.
</p>
<p>
Likewise, you can't have validation rules that erratically accept data one minute, reject the same data the next minute, only to accept it later. This implies that validation must, at least, be deterministic: The same input should always produce the same output.
</p>
<p>
That's half of the way to <a href="https://en.wikipedia.org/wiki/Referential_transparency">referential transparency</a>. Do you need side effects in your validation logic? Hardly, so you might as well implement it as pure functions.
</p>
<h3 id="45e45eee8b8947dc9eb2743de5adac13">
Putting the cart before the horse <a href="#45e45eee8b8947dc9eb2743de5adac13">#</a>
</h3>
<p>
You may still think that my definition smells of a solution in search of a problem. Yes, pure functions are convenient, but does it naturally follow that validation should be implemented as pure functions? Isn't this a case of poor <a href="https://en.wikipedia.org/wiki/Retroactive_continuity">retconning</a>?
</p>
<p>
<img src="/content/binary/distinguishing-between-validation-and-business-rules.png" alt="Two buckets with a 'lid' labeled 'applicative validation' conveniently fitting over the validation bucket.">
</p>
<p>
When faced with the question: <em>What is validation, and what are business rules?</em> it's almost as though I've conveniently sized the <em>Validation</em> sorting bucket so that it perfectly aligns with <a href="/2018/11/05/applicative-validation">applicative validation</a>. Then, the <em>Business rules</em> bucket fits whatever is left. (In the figure, the two buckets are of equal size, which hardly reflects reality. I estimate that the <em>Business rules</em> bucket is much larger, but had I tried to illustrate that, too, in the figure, it would have looked akilter.)
</p>
<p>
This is suspiciously convenient, but consider this: My experience is that this perspective on validation works well. To a great degree, this is because <a href="/2020/12/14/validation-a-solved-problem">I consider validation a solved problem</a>. It's productive to be able to take a chunk of a larger problem and put it aside: <em>We know how to deal with this. There are no risks there.</em>
</p>
<p>
Definitions do, I believe, rarely spring fully formed from some <a href="https://en.wikipedia.org/wiki/Theory_of_forms">Platonic ideal</a>. Rather, people observe what works and eventually extract a condensed description and call it a definition. That's what I've attempted to do here.
</p>
<h3 id="62140574219942c9b34e6b84fe53bfb2">
Business rules change <a href="#62140574219942c9b34e6b84fe53bfb2">#</a>
</h3>
<p>
Let's return to the perspective of validation as a technical contract between your system and a third party. While that contract should be as stable as possible, business rules change.
</p>
<p>
Consider the online restaurant reservation example. Imagine that you're the third-party programmer, and that you've developed a client that can make reservations on behalf of users. When a user wants to make a reservation, there's always a risk that it's not possible. Your client should be able to handle that scenario.
</p>
<p>
Now the restaurant becomes so popular that it decides to change a rule. Earlier, you could make reservations for one, three, or five people, even though the restaurant only has tables for two, four, or six people. Based on its new-found popularity, the restaurant decides that it only accepts reservations for entire tables. Unless it's on the same day and they still have a free table.
</p>
<p>
This changes the <em>behaviour</em> of the system, but not the contract. A reservation for three is still <em>valid</em>, but will be declined because of the new rule.
</p>
<blockquote>
<p>
"Things that change at the same rate belong together. Things that change at different rates belong apart."
</p>
<footer><cite><a href="https://www.facebook.com/notes/kent-beck/naming-from-the-outside-in/464270190272517">Kent Beck</a></cite></footer>
</blockquote>
<p>
Business rules change at different rates than preconditions, so it makes sense to decouple those concerns.
</p>
<h3 id="ac220ae336c24be88aea366e64de39b4">
Conclusion <a href="#ac220ae336c24be88aea366e64de39b4">#</a>
</h3>
<p>
Since validation is a solved problem, it's useful to be able to identify what is validation, and what is something else. As long as an 'input rule' is self-contained (or parametrisable), deterministic, and has no side-effects, you can model it with applicative validation.
</p>
<p>
Equally useful is it to be able to spot when applicative validation isn't a good fit. While I'm sure that someone has published a <code>ValidationT</code> monad transformer for Haskell, I'm not sure I would recommend going that route. In other words, if some business operation involves impure actions, it's not going to fit the mold of applicative validation.
</p>
<p>
This doesn't mean that you can't implement business rules with pure functions. You can, but in my experience, abstractions other than applicative validation are more useful in those cases.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
When is an implementation detail an implementation detail?
https://blog.ploeh.dk/2023/06/19/when-is-an-implementation-detail-an-implementation-detail
2023-06-19T06:10:00+00:00
Mark Seemann
<div id="post">
<p>
<em>On the tension between encapsulation and testability.</em>
</p>
<p>
This article is part of a series called <a href="/2023/02/13/epistemology-of-interaction-testing">Epistemology of interaction testing</a>. A <a href="/2023/03/13/confidence-from-facade-tests">previous article in the series</a> elicited this question:
</p>
<blockquote>
<p>
"following your suggestion, aren’t we testing implementation details?"
</p>
<footer><cite><a href="https://www.relativisticramblings.com/">Christer van der Meeren</a></cite></footer>
</blockquote>
<p>
This frequently-asked question reminds me of an old joke. I think that I first heard it in the eighties, a time when phones had <a href="https://en.wikipedia.org/wiki/Rotary_dial">rotary dials</a>, everyone smoked, you'd receive mail through your apartment door's <a href="https://en.wikipedia.org/wiki/Letter_box">letter slot</a>, and unemployment was high. It goes like this:
</p>
<p>
<em>A painter gets a helper from the unemployment office. A few days later the lady from the office calls the painter and apologizes deeply for the mistake.</em>
</p>
<p>
<em>"What mistake?"</em>
</p>
<p>
<em>"I'm so sorry, instead of a painter we sent you a gynaecologist. Please just let him go, we'll send you a..."</em>
</p>
<p>
<em>"Let him go? Are you nuts, he's my best worker! At the last job, they forgot to leave us the keys, and the guy painted the whole room through the letter slot!"</em>
</p>
<p>
I always think of this joke when the topic is testability. Should you test everything through a system's public API, or do you choose to expose some internal APIs in order to make the code more testable?
</p>
<h3 id="815100cf9c3b492f8abf392528fc5b1e">
Letter slots <a href="#815100cf9c3b492f8abf392528fc5b1e">#</a>
</h3>
<p>
Consider the simplest kind of program you could write: <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program">Hello world</a>. If you didn't consider automated testing, then an <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C# implementation might look like this:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Program</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> Main(<span style="color:blue;">string</span>[] args)
{
Console<span style="color:#b4b4b4;">.</span>WriteLine(<span style="color:#a31515;">"Hello, World!"</span>);
}
}</pre>
</p>
<p>
(Yes, I know that with modern C# you can write such a program using a single <a href="https://learn.microsoft.com/dotnet/csharp/fundamentals/program-structure/top-level-statements">top-level statement</a>, but I'm writing for a broader audience, and only use C# as an example language.)
</p>
<p>
How do we test a program like that? Of course, no-one seriously suggests that we <em>really</em> need to test something that simple, but what if we make it a little more complex? What if we make it possible to supply a name as a command-line argument? What if we want to internationalise the program? What if we want to add a <em>help</em> feature? What if we want to add a feature so that we can send a <em>hello</em> to another recipient, on another machine? When does the behaviour become sufficiently complex to warrant automated testing, and how do we achieve that goal?
</p>
<p>
For now, I wish to focus on <em>how</em> to achieve the goal of testing software. For the sake of argument, then, assume that we want to test the above <em>hello world</em> program.
</p>
<p>
As given, we can run the program and verify that it prints <em>Hello, World!</em> to the console. This is easy to do as a manual test, but harder if you want to automate it.
</p>
<p>
You could write a test framework that automatically starts a new operating-system process (the program) and waits until it exits. This framework should be able to handle processes that exit with success and failure status codes, as well as processes that hang, or never start, or keep restarting... Such a framework also requires a way to capture the standard output stream in order to verify that the expected text is written to it.
</p>
<p>
I'm sure such frameworks exist for various operating systems and programming languages. There is, however, a simpler solution if you can live with the trade-off: You could open the API of your source code a bit:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Program</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> Main(<span style="color:blue;">string</span>[] args)
{
Console.WriteLine(<span style="color:#a31515;">"Hello, World!"</span>);
}
}</pre>
</p>
<p>
While I haven't changed the structure or the layout of the source code, I've made both class and method <code>public</code>. This means that I can now write a normal C# unit test that calls <code>Program.Main</code>.
</p>
<p>
I still need a way to observe the behaviour of the program, but <a href="https://stackoverflow.com/a/2139303/126014">there are known ways of redirecting the Console output in .NET</a> (and I'd be surprised if that wasn't the case on other platforms and programming languages).
</p>
<p>
As we add more and more features to the command-line program, we may be able to keep testing by calling <code>Program.Main</code> and asserting against the redirected <a href="https://learn.microsoft.com/dotnet/api/system.console">Console</a>. As the complexity of the program grows, however, this starts to look like painting a room through the letter slot.
</p>
<h3 id="4681beea02be47b89c9a5d7c2618fad7">
Adding new APIs <a href="#4681beea02be47b89c9a5d7c2618fad7">#</a>
</h3>
<p>
Real programs are usually more than just a command-line utility. They may be smartphone apps that react to user input or network events, or web services that respond to HTTP requests, or complex asynchronous systems that react to, and send messages over durable queues. Even good old batch jobs are likely to pull data from files in order to write to a database, or the other way around. Thus, the interface to the rest of the world is likely larger than just a single <code>Main</code> method.
</p>
<p>
Smartphone apps or message-based systems have event handlers. Web sites or services have classes, methods, or functions that handle incoming HTTP requests. These are essentially event handlers, too. This increases the size of the 'test surface': There are more than a single method you can invoke in order to exercise the system.
</p>
<p>
Even so, a real program will soon grow to a size where testing entirely through the real-world-facing API becomes reminiscent of painting through a letter slot. <a href="https://www.infoq.com/presentations/integration-tests-scam/">J.B. Rainsberger explains that one major problem is the combinatorial explosion of required test cases</a>.
</p>
<p>
Another problem is that the system may produce side effects that you care about. As a basic example, consider a system that, as part of its operation, sends emails. When testing this system, you want to verify that under certain circumstances, the system sends certain emails. How do you do that?
</p>
<p>
If the system has <em>absolutely no concessions to testability</em>, I can think of two options:
</p>
<ul>
<li>You contact the person to whom the system sends the email, and ask him or her to verify receipt of the email. You do that <em>every time</em> you test.</li>
<li>You deploy the System Under Test in an environment with an <a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol">SMTP</a> gateway that redirects all email to another address.</li>
</ul>
<p>
Clearly the first option is unrealistic. The second option is a little better, but you still have to open an email inbox and look for the expected message. Doing so programmatically is, again, technically possible, and I'm sure that there are <a href="https://en.wikipedia.org/wiki/Post_Office_Protocol">POP3</a> or <a href="https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol">IMAP</a> assertion libraries out there. Still, this seems complicated, error-prone, and slow.
</p>
<p>
What could we do instead? I would usually introduce a polymorphic interface such as <code>IPostOffice</code> as a way to substitute the real <code>SmtpPostOffice</code> with a <a href="https://martinfowler.com/bliki/TestDouble.html">Test Double</a>.
</p>
<p>
Notice what happens in these cases: We introduce (or make public) new APIs in order to facilitate automated testing.
</p>
<h3 id="0cc85d98c7ae437aa1156e72fc25dcf6">
Application-boundary API and internal APIs <a href="#0cc85d98c7ae437aa1156e72fc25dcf6">#</a>
</h3>
<p>
It's helpful to distinguish between the real-world-facing API and everything else. In this diagram, I've indicated the public-facing API as a thin green slice facing upwards (assuming that external stimulus - button clicks, HTTP requests, etc. - arrives from above).
</p>
<p>
<img src="/content/binary/public-and-internal-system-apis.png" alt="A box depicting a program, with a small green slice indicating public-facing APIs, and internal blue slices indicating internal APIs.">
</p>
<p>
The real-world-facing API is the code that <em>must</em> be present for the software to work. It could be a button-click handler or an ASP.NET <em>action method</em>:
</p>
<p>
<pre>[HttpPost(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<ActionResult> Post(<span style="color:blue;">int</span> restaurantId, ReservationDto dto)</pre>
</p>
<p>
Of course, if you're using another web framework or another programming language, the details differ, but the application <em>has</em> to have code that handles an HTTP <em>POST</em> request on matching addresses. Or a button click, or a message that arrives on a message bus. You get the point.
</p>
<p>
These APIs are fairly fixed. If you change them, you change the externally observable behaviour of the system. Such changes are likely breaking changes.
</p>
<p>
Based on which framework and programming language you're using, the shape of these APIs will be given. Like I did with the above <code>Main</code> method, you can make it <code>public</code> and use it for testing.
</p>
<p>
A software system of even middling complexity will usually also be decomposed into smaller components. In the figure, I've indicated such subdivisions as boxes with gray outlines. Each of these may present an API to other parts of the system. I've indicated these APIs with light blue.
</p>
<p>
The total size of internal APIs is likely to be larger than the public-facing API. On the other hand, you can (theoretically) change these internal interfaces without breaking the observable behaviour of the system. This is called <a href="/ref/refactoring">refactoring</a>.
</p>
<p>
These internal APIs will often have <code>public</code> access modifiers. That doesn't make them real-world-facing. Be careful not to confuse programming-language <a href="https://en.wikipedia.org/wiki/Access_modifiers">access modifiers</a> with architectural concerns. Objects or their members can have <code>public</code> access modifiers even if the object plays an exclusively internal role. <a href="/2011/05/31/AttheBoundaries,ApplicationsareNotObject-Oriented">At the boundaries, applications aren't object-oriented</a>. And <a href="/2022/05/02/at-the-boundaries-applications-arent-functional">neither are they functional</a>.
</p>
<p>
Likewise, as the original <code>Main</code> method example shows, public APIs may be implemented with a <code>private</code> access modifier.
</p>
<p>
Why do such internal APIs exist? Is it only to support automated testing?
</p>
<h3 id="e6bf8d9b8617435f914d976c05cc9731">
Decomposition <a href="#e6bf8d9b8617435f914d976c05cc9731">#</a>
</h3>
<p>
If we introduce new code, such as the above <code>IPostOffice</code> interface, in order to facilitate testing, we have to be careful that it doesn't lead to <a href="https://dhh.dk/2014/test-induced-design-damage.html">test-induced design damage</a>. The idea that one might introduce an API exclusively to support automated testing rubs some people the wrong way.
</p>
<p>
On the other hand, we do introduce (or make public) APIs for other reasons, too. One common reason is that we want to decompose an application's source code so that parallel development is possible. One person (or team) works on one part, and other people work on other parts. If those parts need to communicate, we need to agree on a contract.
</p>
<p>
Such a contract exists for purely internal reasons. End users don't care, and never know of it. You can change it without impacting users, but you may need to coordinate with other teams.
</p>
<p>
What remains, though, is that we do decompose systems into internal parts, and we've done this since before <a href="https://en.wikipedia.org/wiki/David_Parnas">Parnas</a> wrote <em>On the Criteria to Be Used in Decomposing Systems into Modules</em>.
</p>
<p>
Successful test-driven development introduces <a href="http://wiki.c2.com/?SoftwareSeam">seams</a> where they ought to be in any case.
</p>
<h3 id="0df7cb8e24734176b20f4d61e7e24264">
Testing implementation details <a href="#0df7cb8e24734176b20f4d61e7e24264">#</a>
</h3>
<p>
An internal seam is an implementation detail. Even so, when designed with care, it can serve multiple purposes. It enables teams to develop in parallel, and it enables automated testing.
</p>
<p>
Consider the example from <a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">a previous article in this series</a>. I'll repeat one of the tests here:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">HappyPath</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>, (<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
_repository.Add(state, knownState);
_stateValidator
.Setup(<span style="font-weight:bold;color:#1f377f;">validator</span> => validator.Validate(code, knownState))
.Returns(<span style="color:blue;">true</span>);
_renderer
.Setup(<span style="font-weight:bold;color:#1f377f;">renderer</span> => renderer.Success(knownState))
.Returns(response);
_target
.Complete(state, code)
.Should().Be(response);
}</pre>
</p>
<p>
This test exercises a <em>happy-path</em> case by manipulating <code>IStateValidator</code> and <code>IRenderer</code> Test Doubles. It's a common approach to testability, and what <a href="https://en.wikipedia.org/wiki/David_Heinemeier_Hansson">dhh</a> would label test-induced design damage. While I'm sympathetic to that position, that's not my point. My point is that I consider <code>IStateValidator</code> and <code>IRenderer</code> internal APIs. End users (who probably don't even know what C# is) don't care about these interfaces.
</p>
<p>
Tests like these test against implementation details.
</p>
<p>
This need not be a problem. If you've designed good, stable seams then these tests can serve you for a long time. Testing against implementation details become a problem if those details change. Since it's hard to predict how things change in the future, it behoves us to decouple tests from implementation details as much as possible.
</p>
<p>
The alternative, however, is mail-slot testing, which comes with its own set of problems. Thus, judicious introduction of seams is helpful, even if it couples tests to implementation details.
</p>
<p>
Actually, in the question I quoted above, Christer van der Meeren asked whether my proposed alternative isn't testing implementation details. And, yes, that style of testing <em>also</em> relies on implementation details for testing. It's just a different way to design seams. Instead of designing seams around polymorphic objects, we design them around pure functions and immutable data.
</p>
<p>
There are, I think, advantages to functional programming, but when it comes to relying on implementation details, it's only on par with object-oriented design. Not worse, not better, but the same.
</p>
<h3 id="d5e57741df0247d5a75879a75ca588dc">
Conclusion <a href="#d5e57741df0247d5a75879a75ca588dc">#</a>
</h3>
<p>
Every API in use carries a cost. You need to keep the API stable so that users can use it tomorrow like they did yesterday. This can make it difficult to evolve or improve an API, because you risk introducing a breaking change.
</p>
<p>
There are APIs that a system <em>must</em> have. Software exists to be used, and whether that entails a user clicking on a button or another computer system sending a message to your system, your code must handle such stimulus. This is your real-world-facing contract, and you need to be careful to keep it consistent. The smaller that surface area is, the simpler that task is.
</p>
<p>
The same line of reasoning applies to internal APIs. While end users aren't impacted by changes in internal seams, other code is. If you change an implementation detail, this could cost maintenance work somewhere else. (Modern IDEs can handle some changes like that automatically, such as method renames. In those cases, the cost of change is low.) Therefore, it pays to minimise the internal seams as much as possible. One way to do this is by <a href="/2022/11/21/decouple-to-delete">decoupling to delete code</a>.
</p>
<p>
Still, some internal APIs are warranted. They help you decompose a large system into smaller subparts. While there's a potential maintenance cost with every internal API, there's also the advantage of working with smaller, independent units of code. Often, the benefits are larger than the cost.
</p>
<p>
When done well, such internal seams are useful testing APIs as well. They're still implementation details, though.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Collatz sequences by function composition
https://blog.ploeh.dk/2023/06/12/collatz-sequences-by-function-composition
2023-06-12T05:27:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Mostly in C#, with a few lines of Haskell code.</em>
</p>
<p>
A <a href="/2023/05/08/is-cyclomatic-complexity-really-related-to-branch-coverage">recent article</a> elicited more comments than usual, and I've been so unusually buried in work that only now do I have a little time to respond to some of them. In <a href="/2023/05/08/is-cyclomatic-complexity-really-related-to-branch-coverage#02568f995d91432da540858644b61e89">one comment</a> <a href="http://github.com/neongraal">Struan Judd</a> offers a refactored version of my <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">Collatz sequence</a> in order to shed light on the relationship between <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> and test case coverage.
</p>
<p>
Struan Judd's agenda is different from what I have in mind in this article, but the comment inspired me to refactor my own code. I wanted to see what it would look like with this constraint: It should be possible to test odd input numbers without exercising the code branches related to even numbers.
</p>
<p>
The problem with more naive implementations of Collatz sequence generators is that (apart from when the input is <em>1</em>) the sequence ends with a tail of even numbers halving down to <em>1</em>. I'll start with a simple example to show what I mean.
</p>
<h3 id="ccd16365f7ce4842870f90e43267c33f">
Standard recursion <a href="#ccd16365f7ce4842870f90e43267c33f">#</a>
</h3>
<p>
At first I thought that my confusion originated from the imperative structure of the original example. For more than a decade, I've preferred functional programming (FP), and even when I write object-oriented code, I tend to use concepts and patterns from FP. Thus I, naively, rewrote my Collatz generator as a recursive function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<<span style="color:blue;">int</span>> Sequence(<span style="color:blue;">int</span> n)
{
<span style="color:blue;">if</span> (n < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(n),
<span style="color:#a31515;">$"Only natural numbers allowed, but given </span>{n}<span style="color:#a31515;">."</span>);
<span style="color:blue;">if</span> (n == 1)
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { n };
<span style="color:blue;">else</span>
<span style="color:blue;">if</span> (n % 2 == 0)
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { n }.Concat(Sequence(n / 2)).ToArray();
<span style="color:blue;">else</span>
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { n }.Concat(Sequence(n * 3 + 1)).ToArray();
}</pre>
</p>
<p>
Recursion is usually not recommended in C#, because a sufficiently long sequence could blow the call stack. I wouldn't write production C# code like this, but you could do something like this in <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a> where the languages offer solutions to that problem. In other words, the above example is only for educational purposes.
</p>
<p>
It doesn't, however, solve the problem that confused me: If you want to test the branch that deals with odd numbers, you can't avoid also exercising the branch that deals with even numbers.
</p>
<h3 id="4587120e2d9e483aba8cf7297704eb28">
Calculating the next value <a href="#4587120e2d9e483aba8cf7297704eb28">#</a>
</h3>
<p>
In functional programming, you solve most problems by decomposing them into smaller problems and then compose the smaller <a href="/2018/03/05/some-design-patterns-as-universal-abstractions">Lego bricks</a> with standard combinators. It seemed like a natural refactoring step to first pull the calculation of the next value into an independent function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">int</span> Next(<span style="color:blue;">int</span> n)
{
<span style="color:blue;">if</span> ((n % 2) == 0)
<span style="color:blue;">return</span> n / 2;
<span style="color:blue;">else</span>
<span style="color:blue;">return</span> n * 3 + 1;
}</pre>
</p>
<p>
This function has a cyclomatic complexity of <em>2</em> and no loops or recursion. Test cases that exercise the even branch never touch the odd branch, and vice versa.
</p>
<p>
A parametrised test might look like this:
</p>
<p>
<pre>[Theory]
[InlineData( 2, 1)]
[InlineData( 3, 10)]
[InlineData( 4, 2)]
[InlineData( 5, 16)]
[InlineData( 6, 3)]
[InlineData( 7, 22)]
[InlineData( 8, 4)]
[InlineData( 9, 28)]
[InlineData(10, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> NextExamples(<span style="color:blue;">int</span> n, <span style="color:blue;">int</span> expected)
{
<span style="color:blue;">int</span> actual = Collatz.Next(n);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The <code>NextExamples</code> test obviously defines more than the two test cases that are required to cover the <code>Next</code> function, but since <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">code coverage shouldn't be used as a target measure</a>, I felt that more than two test cases were warranted. This often happens, and should be considered normal.
</p>
<h3 id="e46d3353db8b4d50bcbe595dbbe3dbd0">
A Haskell proof of concept <a href="#e46d3353db8b4d50bcbe595dbbe3dbd0">#</a>
</h3>
<p>
While I had a general idea about the direction in which I wanted to go, I felt that I lacked some standard functional building blocks in C#: Most notably an infinite, lazy sequence generator. Before moving on with the C# code, I threw together a proof of concept in Haskell.
</p>
<p>
The <code>next</code> function is just a one-liner (if you ignore the optional type declaration):
</p>
<p>
<pre><span style="color:#2b91af;">next</span> <span style="color:blue;">::</span> <span style="color:blue;">Integral</span> a <span style="color:blue;">=></span> a <span style="color:blue;">-></span> a
next n = <span style="color:blue;">if</span> <span style="color:blue;">even</span> n <span style="color:blue;">then</span> n `div` 2 <span style="color:blue;">else</span> n * 3 + 1</pre>
</p>
<p>
A few examples in GHCi suggest that it works as intended:
</p>
<p>
<pre>ghci> next 2
1
ghci> next 3
10
ghci> next 4
2
ghci> next 5
16</pre>
</p>
<p>
Haskell comes with enough built-in functions that that was all I needed to implement a Colaltz-sequence generator:
</p>
<p>
<pre><span style="color:#2b91af;">collatz</span> <span style="color:blue;">::</span> <span style="color:blue;">Integral</span> a <span style="color:blue;">=></span> a <span style="color:blue;">-></span> [a]
collatz n = (<span style="color:blue;">takeWhile</span> (1 <) $ <span style="color:blue;">iterate</span> next n) ++ [1]</pre>
</p>
<p>
Again, a few examples suggest that it works as intended:
</p>
<p>
<pre>ghci> collatz 1
[1]
ghci> collatz 2
[2,1]
ghci> collatz 3
[3,10,5,16,8,4,2,1]
ghci> collatz 4
[4,2,1]
ghci> collatz 5
[5,16,8,4,2,1]</pre>
</p>
<p>
I should point out, for good measure, that since this is a proof of concept I didn't add a Guard Clause against zero or negative numbers. I'll keep that in the C# code.
</p>
<h3 id="3c46180459334e86a278f9934fa4b032">
Generator <a href="#3c46180459334e86a278f9934fa4b032">#</a>
</h3>
<p>
While C# does come with a <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.takewhile">TakeWhile</a> function, there's no direct equivalent to Haskell's <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#v:iterate">iterate</a> function. It's not difficult to implement, though:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IEnumerable<T> Iterate<<span style="color:#2b91af;">T</span>>(Func<T, T> f, T x)
{
<span style="color:blue;">var</span> current = x;
<span style="color:blue;">while</span> (<span style="color:blue;">true</span>)
{
<span style="color:blue;">yield</span> <span style="color:blue;">return</span> current;
current = f(current);
}
}</pre>
</p>
<p>
While this <code>Iterate</code> implementation has a cyclomatic complexity of only <em>2</em>, it exhibits the same kind of problem as the previous attempts at a Collatz-sequence generator: You can't test one branch without testing the other. Here, it even seems as though it's impossible to test the branch that skips the loop.
</p>
<p>
In Haskell the <code>iterate</code> function is simply a lazily-evaluated recursive function, but that's not going to solve the problem in the C# case. On the other hand, it helps to know that the <code>yield</code> keyword in C# is just syntactic sugar over a compiler-generated <a href="https://en.wikipedia.org/wiki/Iterator_pattern">Iterator</a>.
</p>
<p>
Just for the exercise, then, I decided to write an explicit Iterator instead.
</p>
<h3 id="a834d92c5d864109aff31cb270c06b83">
Iterator <a href="#a834d92c5d864109aff31cb270c06b83">#</a>
</h3>
<p>
For the sole purpose of demonstrating that it's possible to refactor the code so that branches are independent of each other, I rewrote the <code>Iterate</code> function to return an explicit <code>IEnumerable<T></code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IEnumerable<T> Iterate<<span style="color:#2b91af;">T</span>>(Func<T, T> f, T x)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Iterable<T>(f, x);
}</pre>
</p>
<p>
The <code><span style="color:#2b91af;">Iterable</span><<span style="color:#2b91af;">T</span>></code> class is a private helper class, and only exists to return an <code>IEnumerator<T></code>:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Iterable</span><<span style="color:#2b91af;">T</span>> : IEnumerable<T>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<T, T> f;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> T x;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Iterable</span>(Func<T, T> f, T x)
{
<span style="color:blue;">this</span>.f = f;
<span style="color:blue;">this</span>.x = x;
}
<span style="color:blue;">public</span> IEnumerator<T> GetEnumerator()
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Iterator<T>(f, x);
}
IEnumerator IEnumerable.GetEnumerator()
{
<span style="color:blue;">return</span> GetEnumerator();
}
}</pre>
</p>
<p>
The <code><span style="color:#2b91af;">Iterator</span><<span style="color:#2b91af;">T</span>></code> class does the heavy lifting:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Iterator</span><<span style="color:#2b91af;">T</span>> : IEnumerator<T>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<T, T> f;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> T original;
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> iterating;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">Iterator</span>(Func<T, T> f, T x)
{
<span style="color:blue;">this</span>.f = f;
original = x;
Current = x;
}
<span style="color:blue;">public</span> T Current { <span style="color:blue;">get</span>; <span style="color:blue;">private</span> <span style="color:blue;">set</span>; }
[MaybeNull]
<span style="color:blue;">object</span> IEnumerator.Current => Current;
<span style="color:blue;">public</span> <span style="color:blue;">void</span> Dispose()
{
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> MoveNext()
{
<span style="color:blue;">if</span> (iterating)
Current = f(Current);
<span style="color:blue;">else</span>
iterating = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span> <span style="color:blue;">true</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> Reset()
{
Current = original;
iterating = <span style="color:blue;">false</span>;
}
}</pre>
</p>
<p>
I can't think of a situation where I would write code like this in a real production code base. Again, I want to stress that this is only an exploration of what's possible. What this does show is that all members have low cyclomatic complexity, and none of them involve looping or recursion. Only one method, <code>MoveNext</code>, has a cyclomatic complexity greater than one, and its branches are independent.
</p>
<h3 id="f8030072341448bd80c8c0af055abc10">
Composition <a href="#f8030072341448bd80c8c0af055abc10">#</a>
</h3>
<p>
All Lego bricks are now in place, enabling me to compose the <code>Sequence</code> like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<<span style="color:blue;">int</span>> Sequence(<span style="color:blue;">int</span> n)
{
<span style="color:blue;">if</span> (n < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(n),
<span style="color:#a31515;">$"Only natural numbers allowed, but given </span>{n}<span style="color:#a31515;">."</span>);
<span style="color:blue;">return</span> Generator.Iterate(Next, n).TakeWhile(i => 1 < i).Append(1).ToList();
}</pre>
</p>
<p>
This function has a cyclomatic complexity of <em>2</em>, and each branch can be exercised independently of the other.
</p>
<p>
Which is what I wanted to accomplish.
</p>
<h3 id="e32a8549e23446ba8c9cffd5739d62d6">
Conclusion <a href="#e32a8549e23446ba8c9cffd5739d62d6">#</a>
</h3>
<p>
I'm still re-orienting myself when it comes to understanding the relationship between cyclomatic complexity and test coverage. As part of that work, I wanted to refactor the Collatz code I originally showed. This article shows one way to decompose and reassemble the function in such a way that all branches are independent of each other, so that each can be covered by test cases without exercising the other branch.
</p>
<p>
I don't know if this is useful to anyone else, but I found the hours well-spent.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="b43aefd2fa5f4e7a916a31587fa4886e">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#b43aefd2fa5f4e7a916a31587fa4886e">#</a></div>
<div class="comment-content">
<p>
I really like this article. So much so that I tried to implement this approach for a recursive function at my work. However, I realized that there are some required conditions.
</p>
<p>
First, the recusrive funciton must be tail recursive. Second, the recursive function must be closed (i.e. the output is a subset/subtype of the input). Neither of those were true for my function at work. An example of a function that doesn't satisfy either of these conditions is the function that computes the depth of a tree.
</p>
<p>
A less serious issue is that your code, as currently implemented, requires that there only be one base case value. The issue is that you have duplicated code: the unique base case value appears both in the call to TakeWhile and in the subsequent call to Append. Instead of repeating yourself, I recommend defining an extension method on Enumerable called TakeUntil that works like TakeWhile but also returns the first value on which the predicate returned false. <a href="https://stackoverflow.com/questions/2242318/how-could-i-take-1-more-item-from-linqs-takewhile/6817553#6817553">Here</a> is an implementation of that extension method.
</p>
</div>
<div class="comment-date">2023-06-22 13:45 UTC</div>
</div>
<div class="comment" id="420c5aef12504e048d5f8c6d2691f0fa">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#420c5aef12504e048d5f8c6d2691f0fa">#</a></div>
<div class="comment-content">
<p>
Tyson, thank you for writing. I suppose that you can't share the function that you mention, so I'll have to discuss it in general terms.
</p>
<p>
As far as I can tell you can always(?) <a href="/2015/12/22/tail-recurse">refactor non-tail-recursive functions to tail-recursive implementations</a>. In practice, however, there's rarely need for that, since you can usually separate the problem into a general-purpose library function on the one hand, and your special function on the other. Examples of general-purpose functions are the various maps and folds. If none of the standard functions do the trick, the type's associated <a href="/2019/04/29/catamorphisms">catamorphism</a> ought to.
</p>
<p>
One example of that is computing the depth of a tree, which we've <a href="/2019/08/05/rose-tree-catamorphism">already discussed</a>.
</p>
<p>
I don't insist that any of this is universally true, so if you have another counter-example, I'd be keen to see it.
</p>
<p>
You are, of course, right about using a <code>TakeUntil</code> extension instead. I was, however, trying to use as many built-in components as possible, so as to not unduly confuse casual readers.
</p>
</div>
<div class="comment-date">2023-06-27 12:35 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
The Git repository that vanished
https://blog.ploeh.dk/2023/06/05/the-git-repository-that-vanished
2023-06-05T06:38:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A pair of simple operations resurrected it.</em>
</p>
<p>
The other day I had an 'interesting' experience. I was about to create a small pull request, so I checked out a new branch in Git and switched to my editor in order to start coding when the battery on my laptop died.
</p>
<p>
Clearly, when this happens, the computer immediately stops, without any graceful shutdown.
</p>
<p>
I plugged in the laptop and booted it. When I navigated to the source code folder I was working on, the files where there, but it was no longer a Git repository!
</p>
<h3 id="25e1ded964e041ba82394682a7ce046b">
Git is fixable <a href="#25e1ded964e041ba82394682a7ce046b">#</a>
</h3>
<p>
Git is more complex, and more powerful, than most developers care to deal with. Over the years, I've observed hundreds of people interact with Git in various ways, and most tend to give up at the first sign of trouble.
</p>
<p>
The point of this article isn't to point fingers at anyone, but rather to serve as a gentle reminder that Git tends to be eminently fixable.
</p>
<p>
Often, when people run into problems with Git, their only recourse is to delete the repository and clone it again. I've seen people do that enough times to realise that it might be helpful to point out: <em>You may not have to do that.</em>
</p>
<h3 id="c7afb00a22cd49f0a86b3c2e9f560d91">
Corruption <a href="#c7afb00a22cd49f0a86b3c2e9f560d91">#</a>
</h3>
<p>
Since I <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">use Git tactically</a> I have many repositories on my machine that have no remotes. In those cases, deleting the entire directory and cloning it from the remote isn't an option. I do take backups, though.
</p>
<p>
Still, in this story, the repository I was working with <em>did</em> have a remote. Even so, I was reluctant to delete everything and start over, since I had multiple branches and stashes I'd used for various experiments. Many of those I'd never pushed to the remote, so starting over would mean that I'd lose all of that. It was, perhaps, not a catastrophe, but I would certainly prefer to restore my local repository, if possible.
</p>
<p>
The symptoms were these: When you work with Git in Git Bash, the prompt will indicate which branch you're on. That information was absent, so I was already worried. A quick query confirmed my fears:
</p>
<p>
<pre>$ git status
fatal: not a git repository (or any of the parent directories): .git</pre>
</p>
<p>
All the source code was there, but it looked as though the Git repository was gone. The code still compiled, but there was no source history.
</p>
<p>
Since all code files were there, I had hope. It helps knowing that Git, too, is file-based, and all files are in a hidden directory called <code>.git</code>. If all the source code was still there, perhaps the <code>.git</code> files were there, too. Why wouldn't they be?
</p>
<p>
<pre>$ ls .git
COMMIT_EDITMSG description gitk.cache hooks/ info/ modules/ objects/ packed-refs
config FETCH_HEAD HEAD index logs/ ms-persist.xml ORIG_HEAD refs/</pre>
</p>
<p>
Jolly good! The <code>.git</code> files were still there.
</p>
<p>
I now had a hypothesis: The unexpected shutdown of my machine had left some 'dangling pointers' in <code>.git</code>. A modern operating system may delay writes to disk, so perhaps my <code>git checkout</code> command had never made it all the way to disk - or, at least, not all of it.
</p>
<p>
If the repository was 'merely' corrupted in the sense that a few of the reference pointers had gone missing, perhaps it was fixable.
</p>
<h3 id="a875078b3cbf466cbc8bf3396f9f8ce2">
Empty-headed <a href="#a875078b3cbf466cbc8bf3396f9f8ce2">#</a>
</h3>
<p>
A few web searches indicated that the problem might be with the <code>HEAD</code> file, so I investigated its contents:
</p>
<p>
<pre>$ cat .git/HEAD
</pre>
</p>
<p>
That was all. No output. The <code>HEAD</code> file was empty.
</p>
<p>
That file is not supposed to be empty. It's supposed to contain a commit ID or a reference that tells the Git CLI what the current <em>head</em> is - that is, which commit is currently checked out.
</p>
<p>
While I had checked out a new branch when my computer shut down, I hadn't written any code yet. Thus, the easiest remedy would be to restore the head to <code>master</code>. So I opened the <code>HEAD</code> file in Vim and added this to it:
</p>
<p>
<pre>ref: refs/heads/master</pre>
</p>
<p>
And just like that, the entire Git repository returned!
</p>
<h3 id="ce1dfafa82dd4ea0b4c6d5603f0111cf">
Bad object <a href="#ce1dfafa82dd4ea0b4c6d5603f0111cf">#</a>
</h3>
<p>
The branches, the history, everything looked as though it was restored. A little more investigation, however, revealed one more problem:
</p>
<p>
<pre>$ git log --oneline --all
fatal: bad object refs/heads/some-branch</pre>
</p>
<p>
While a normal <code>git log</code> command worked fine, as soon as I added the <code>--all</code> switch, I got that <code>bad object</code> error message, with the name of the branch I had just created before the computer shut down. (The name of that branch wasn't <code>some-branch</code> - that's just a surrogate I'm using for this article.)
</p>
<p>
Perhaps this was the same kind of problem, so I explored the <code>.git</code> directory further and soon discovered a <code>some-branch</code> file in <code>.git/refs/heads/</code>. What did the contents look like?
</p>
<p>
<pre>$ cat .git/refs/heads/some-branch
</pre>
</p>
<p>
Another empty file!
</p>
<p>
Since I had never committed any work to that branch, the easiest fix was to simply delete the file:
</p>
<p>
<pre>$ rm .git/refs/heads/some-branch</pre>
</p>
<p>
That solved that problem as well. No more <code>fatal: bad object</code> error when using the <code>--all</code> switch with <code>git log</code>.
</p>
<p>
No more problems have shown up since then.
</p>
<h3 id="018956b830c74f3797fff92449045b37">
Conclusion <a href="#018956b830c74f3797fff92449045b37">#</a>
</h3>
<p>
My experience with Git is that it's so powerful that you can often run into trouble. On the other hand, it's also so powerful that you can also use it to extricate yourself from trouble. Learning how to do that will teach you how to use Git to your advantage.
</p>
<p>
The problem that I ran into here wasn't fixable with the Git CLI itself, but turned out to still be easily remedied. A Git guru like <a href="https://megakemp.com/">Enrico Campidoglio</a> could most likely have solved my problems without even searching the web. The details of how to solve the problems were new to me, but it took me a few web searches and perhaps five-ten minutes to fix them.
</p>
<p>
The point of this article, then, isn't in the details. It's that it pays to do a little investigation when you run into problems with Git. I already knew that, but I thought that this little story was a good occasion to share that knowledge.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Favour flat code file folders
https://blog.ploeh.dk/2023/05/29/favour-flat-code-file-folders
2023-05-29T19:20:00+00:00
Mark Seemann
<div id="post">
<p>
<em>How code files are organised is hardly related to sustainability of code bases.</em>
</p>
<p>
My recent article <a href="/2023/05/15/folders-versus-namespaces">Folders versus namespaces</a> prompted some reactions. A few kind people <a href="https://twitter.com/Savlambda/status/1658453377489960960">shared how they organise code bases</a>, both on Twitter and in the comments. Most reactions, however, carry the (subliminal?) subtext that organising code in file folders is how things are done.
</p>
<p>
I'd like to challenge that notion.
</p>
<p>
As is usually my habit, I mostly do this to make you think. I don't insist that I'm universally right in all contexts, and that everyone else are wrong. I only write to suggest that alternatives exist.
</p>
<p>
The <a href="/2023/05/15/folders-versus-namespaces">previous article</a> wasn't a recommendation; it's was only an exploration of an idea. As I describe in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I recommend flat folder structures. Put most code files in the same directory.
</p>
<h3 id="58c997618e794a26bc366859b216e226">
Finding files <a href="#58c997618e794a26bc366859b216e226">#</a>
</h3>
<p>
People usually dislike that advice. <em>How can I find anything?!</em>
</p>
<p>
Let's start with a counter-question: How can you find anything if you have a deep file hierarchy? Usually, if you've organised code files in subfolders of subfolders of folders, you typically start with a collapsed view of the tree.
</p>
<p>
<img src="/content/binary/mostly-collapsed-solution-explorer-tree.png" alt="Mostly-collapsed Solution Explorer tree.">
</p>
<p>
Those of my readers who know a little about search algorithms will point out that a <a href="https://en.wikipedia.org/wiki/Search_tree">search tree</a> is an efficient data structure for locating content. The assumption, however, is that you already know (or can easily construct) the <em>path</em> you should follow.
</p>
<p>
In a view like the above, <em>most</em> files are hidden in one of the collapsed folders. If you want to find, say, the <code>Iso8601.cs</code> file, where do you look for it? Which path through the tree do you take?
</p>
<p>
<em>Unfair!</em>, you protest. You don't know what the <code>Iso8601.cs</code> file does. Let me enlighten you: That file contains functions that render dates and times in <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO 8601</a> formats. These are used to transmit dates and times between systems in a platform-neutral way.
</p>
<p>
So where do you look for it?
</p>
<p>
It's probably not in the <code>Controllers</code> or <code>DataAccess</code> directories. Could it be in the <code>Dtos</code> folder? <code>Rest</code>? <code>Models</code>?
</p>
<p>
Unless your first guess is correct, you'll have to open more than one folder before you find what you're looking for. If each of these folders have subfolders of their own, that only exacerbates the problem.
</p>
<p>
If you're curious, some programmer (me) decided to put the <code>Iso8601.cs</code> file in the <code>Dtos</code> directory, and perhaps you already guessed that. That's not the point, though. The point is this: 'Organising' code files in folders is only efficient if you can unerringly predict the correct path through the tree. You'll have to get it right the first time, every time. If you don't, it's not the most efficient way.
</p>
<p>
Most modern code editors come with features that help you locate files. In <a href="https://visualstudio.microsoft.com/">Visual Studio</a>, for example, you just hit <kbd>Ctrl</kbd>+<kbd>,</kbd> and type a bit of the file name: <em>iso</em>:
</p>
<p>
<img src="/content/binary/visual-studio-go-to-all.png" alt="Visual Studio Go To All dialog.">
</p>
<p>
Then hit <kbd>Enter</kbd> to open the file. In <a href="https://code.visualstudio.com/">Visual Studio Code</a>, the corresponding keyboard shortcut is <kbd>Ctrl</kbd>+<kbd>p</kbd>, and I'd be highly surprised if other editors didn't have a similar feature.
</p>
<p>
To conclude, so far: Organising files in a folder hierarchy is <em>at best</em> on par with your editor's built-in search feature, but is likely to be less productive.
</p>
<h3 id="f3dffaf6e73e42a6a10e1f403b8bf37b">
Navigating a code base <a href="#f3dffaf6e73e42a6a10e1f403b8bf37b">#</a>
</h3>
<p>
What if you don't quite know the name of the file you're looking for? In such cases, the file system is even less helpful.
</p>
<p>
I've seen people work like this:
</p>
<ol>
<li>Look at some code. Identify another code item they'd like to view. (Examples may include: Looking at a unit test and wanting to see the <a href="https://en.wikipedia.org/wiki/System_under_test">SUT</a>, or looking at a class and wanting to see the base class.)</li>
<li>Move focus to the editor's folder view (in Visual Studio called the <em>Solution Explorer</em>).</li>
<li>Scroll to find the file in question.</li>
<li>Double-click said file.</li>
</ol>
<p>
Regardless of how the files are organised, you could, instead, <em>go to definition</em> (<kbd>F12</kbd> with my Visual Studio keyboard layout) in a single action. Granted, how well this works varies with editor and language. Still, even when editor support is less optimal (e.g. a code base with a mix of <a href="https://fsharp.org/">F#</a> and C#, or a <a href="https://www.haskell.org/">Haskell</a> code base), I can often find things faster with a search (<kbd>Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>f</kbd>) than via the file system.
</p>
<p>
A modern editor has efficient tools that can help you find what you're looking for. Looking through the file system is often the least efficient way to find the code you're looking for.
</p>
<h3 id="b050833e38ab4ee7a1ed3429979d8405">
Large code bases <a href="#b050833e38ab4ee7a1ed3429979d8405">#</a>
</h3>
<p>
Do I recommend that you dump thousands of code files in a single directory, then?
</p>
<p>
Hardly, but a question like that presupposes that code bases have thousands of code files. Or more, even. And I've seen such code bases.
</p>
<p>
Likewise, it's a common complaint that Visual Studio is slow when opening solutions with hundreds of projects. And the day Microsoft fixes that problem, people are going to complain that it's slow when opening a solution with thousands of projects.
</p>
<p>
Again, there's an underlying assumption: That a 'real' code base <em>must</em> be so big.
</p>
<p>
Consider alternatives: Could you decompose the code base into multiple smaller code bases? Could you extract subsystems of the code base and package them as reusable packages? Yes, you can do all those things.
</p>
<p>
Usually, I'd pull code bases apart long before they hit a thousand files. Extract modules, libraries, utilities, etc. and put them in separate code bases. Use existing package managers to distribute these smaller pieces of code. Keep the code bases small, and you don't need to organise the files.
</p>
<h3 id="64b5d272a18d41778a021540cd710fd1">
Maintenance <a href="#64b5d272a18d41778a021540cd710fd1">#</a>
</h3>
<p>
<em>But, if all files are mixed together in a single folder, how do we keep the code maintainable?</em>
</p>
<p>
Once more, implicit (but false) assumptions underlie such questions. The assumption is that 'neatly' organising files in hierarchies somehow makes the code easier to maintain. Really, though, it's more akin to a teenager who 'cleans' his room by sweeping everything off the floor only to throw it into his cupboard. It does enable hoovering the floor, but it doesn't make it easier to find anything. The benefit is mostly superficial.
</p>
<p>
Still, consider a tree.
</p>
<p>
<img src="/content/binary/file-tree.png" alt="A tree of folders with files.">
</p>
<p>
This may not be the way you're used to see files and folders rendered, but this diagram emphases the tree structure and makes what happens next starker.
</p>
<p>
The way that most languages work, putting code files in folders makes little difference to the compiler. If the classes in my <code>Controllers</code> folder need some classes from the <code>Dtos</code> folder, you just use them. You may need to import the corresponding namespace, but modern editors make that a breeze.
</p>
<p>
<img src="/content/binary/two-files-coupled-across-tree-branches.png" alt="A tree of folders with files. Two files connect across the tree's branches.">
</p>
<p>
In the above tree, the two files who now communicate are coloured orange. Notice that they span across two main branches of the tree.
</p>
<p>
Thus, even though the files are organised in a tree, it has no impact on the maintainability of the code base. Code can reference other code in other parts of the tree. You can <a href="http://evelinag.com/blog/2014/06-09-comparing-dependency-networks/">easily create cycles in a language like C#</a>, and organising files in trees makes no difference.
</p>
<p>
Most languages, however, enforce that library dependencies form a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph</a> (i.e. <a href="/2013/12/03/layers-onions-ports-adapters-its-all-the-same">if the data access library references the domain model, the domain model can't reference the data access library</a>). The C# (and most other languages) compiler enforces what <a href="/ref/appp">Robert C. Martin calls the Acyclic Dependencies Principle</a>. Preventing cycles prevents <a href="https://en.wikipedia.org/wiki/Spaghetti_code">spaghetti code</a>, which is <a href="/2022/11/21/decouple-to-delete">key to a maintainable code base</a>.
</p>
<p>
(Ironically, <a href="/2015/04/15/c-will-eventually-get-all-f-features-right">one of the more controversial features of F# is actually one of its greatest strengths: It doesn't allow cycles</a>.)
</p>
<h3 id="ffe0131017254522acd40d2445929f24">
Tidiness <a href="#ffe0131017254522acd40d2445929f24">#</a>
</h3>
<p>
Even so, I do understand the lure of organising code files in an elaborate hierarchy. It looks so <em>neat</em>.
</p>
<p>
Previously, I've <a href="/2021/05/17/against-consistency">touched on the related topic of consistency</a>, and while I'm a bit of a neat freak myself, I have to realise that tidiness seems to be largely unrelated to the sustainability of a code base.
</p>
<p>
As another example in this category, I've seen more than one code base with consistently beautiful documentation. Every method was adorned with formal <a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/xmldoc/">XML documentation</a> with every input parameter as well as output described.
</p>
<p>
Every new phase in a method was delineated with another neat comment, nicely adorned with a 'comment frame' and aligned with other comments.
</p>
<p>
It was glorious.
</p>
<p>
Alas, that documentation sat on top of 750-line methods with a cyclomatic complexity above 50. The methods were so long that <a href="/2019/12/23/the-case-of-the-mysterious-curly-bracket">developers had to introduce artificial variable scopes to avoid naming collisions</a>.
</p>
<p>
The reason I was invited to look at that code in the first place was that the organisation had trouble with maintainability, and they asked me to help.
</p>
<p>
It was neat, yet unmaintainable.
</p>
<p>
This discussion about tidiness may seem like a digression, but I think it's important to make the implicit explicit. If I'm not much mistaken, preference for order is a major reason that so many developers want to organise code files into hierarchies.
</p>
<h3 id="47227afb0a674330bb1b3556f751799d">
Organising principles <a href="#47227afb0a674330bb1b3556f751799d">#</a>
</h3>
<p>
What other motivations for file hierarchies could there be? How about the directory structure as an organising principle?
</p>
<p>
The two most common organising principles are <a href="/2023/05/15/folders-versus-namespaces">those that I experimented with in the previous article</a>:
</p>
<ol>
<li>By technical role (Controller, View Model, DTO, etc.)</li>
<li>By feature</li>
</ol>
<p>
A technical leader might hope that, by presenting a directory structure to team members, it imparts an organising principle on the code to be.
</p>
<p>
It may even do so, but is that actually a benefit?
</p>
<p>
It might subtly discourage developers from introducing code that doesn't fit into the predefined structure. If you organise code by technical role, developers might put most code in Controllers, producing mostly procedural <a href="https://martinfowler.com/eaaCatalog/transactionScript.html">Transaction Scripts</a>. If you organise by feature, this might encourage duplication because developers don't have a natural place to put general-purpose code.
</p>
<p>
<em>You can put truly shared code in the root folder,</em> the counter-argument might be. This is true, but:
</p>
<ol>
<li>This seems to be implicitly discouraged by the folder structure. After all, the hierarchy is there for a reason, right? Thus, any file you place in the root seems to suggest a failure of organisation.</li>
<li>On the other hand, if you flaunt that not-so-subtle hint and put many code files in the root, what advantage does the hierarchy furnish?</li>
</ol>
<p>
In <em>Information Distribution Aspects of Design Methodology</em> <a href="https://en.wikipedia.org/wiki/David_Parnas">David Parnas</a> writes about documentation standards:
</p>
<blockquote>
<p>
"standards tend to force system structure into a standard mold. A standard [..] makes some assumptions about the system. [...] If those assumptions are violated, the [...] organization fits poorly and the vocabulary must be stretched or misused."
</p>
<footer><cite><a href="https://en.wikipedia.org/wiki/David_Parnas">David Parnas</a>, <em>Information Distribution Aspects of Design Methodology</em></cite></footer>
</blockquote>
<p>
(The above quote is on the surface about documentation standards, and I've deliberately butchered it a bit (clearly marked) to make it easier to spot the more general mechanism.)
</p>
<p>
In the same paper, Parnas describes the danger of making hard-to-change decisions too early. Applied to directory structure, the lesson is that you should postpone designing a file hierarchy until you know more about the problem. Start with a flat directory structure and add folders later, if at all.
</p>
<h3 id="426c5128ef804c6abfe6005d267cb624">
Beyond files? <a href="#426c5128ef804c6abfe6005d267cb624">#</a>
</h3>
<p>
My claim is that you don't need <em>much</em> in way of directory hierarchy. From this doesn't follow, however, that we may never leverage such options. Even though I left most of the example code for <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> in a single folder, I did add a specialised folder as an <a href="/ref/ddd">anti-corruption layer</a>. Folders do have their uses.
</p>
<blockquote>
<p>
"Why not take it to the extreme and place most code in a single file? If we navigate by "namespace view" and search, do we need all those files?"
</p>
<footer><cite><a href="https://twitter.com/ronnieholm/status/1662219232652963840">Ronnie Holm</a></cite></footer>
</blockquote>
<p>
Following a thought to its extreme end can shed light on a topic. Why not, indeed, put all code in a single file?
</p>
<p>
Curious thought, but possibly not new. I've never programmed in <a href="https://en.wikipedia.org/wiki/Smalltalk">SmallTalk</a>, but as I understand it, the language came with tooling that was both <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE</a> and execution environment. Programmers would write source code in the editor, but although the code was persisted to disk, it may not have been as text files.
</p>
<p>
Even if I completely misunderstand how SmallTalk worked, it's not inconceivable that you could have a development environment based directly on a database. Not that I think that this sounds like a good idea, but it sounds technically possible.
</p>
<p>
Whether we do it one way or another seems mostly to be a question of tooling. What problems would you have if you wrote an entire C# (Java, Python, F#, or similar) code base as a single file? It becomes more difficult to look at two or more parts of the code base at the same time. Still, Visual Studio can actually give you split windows of the same file, but I don't know how it scales if you need multiple views over the same huge file.
</p>
<h3 id="fd3145a641ad4de18dbab9616e2ed4b7">
Conclusion <a href="#fd3145a641ad4de18dbab9616e2ed4b7">#</a>
</h3>
<p>
I recommend flat directory structures for code files. Put most code files in the root of a library or app. Of course, if your system is composed from multiple libraries (dependencies), each library has its own directory.
</p>
<p>
Subfolders aren't <em>prohibited</em>, only generally discouraged. Legitimate reasons to create subfolders may emerge as the code base evolves.
</p>
<p>
My misgivings about code file directory hierarchies mostly stem from the impact they have on developers' minds. This may manifest as <a href="https://en.wikipedia.org/wiki/Magical_thinking">magical thinking</a> or <a href="https://en.wikipedia.org/wiki/Cargo_cult">cargo-cult programming</a>: Erect elaborate directory structures to keep out the evil spirits of spaghetti code.
</p>
<p>
It doesn't work that way.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Visual Studio Code snippet to make URLs relative
https://blog.ploeh.dk/2023/05/23/visual-studio-code-snippet-to-make-urls-relative
2023-05-23T19:23:00+00:00
Mark Seemann
<div id="post">
<p>
<em>Yes, it involves JSON and regular expressions.</em>
</p>
<p>
Ever since I <a href="/2013/03/03/moving-the-blog-to-jekyll">migrated the blog off dasBlog</a> I've been <a href="https://rakhim.org/honestly-undefined/19/">writing the articles in raw HTML</a>. The reason is mostly a historical artefact: Originally, I used <a href="https://en.wikipedia.org/wiki/Windows_Live_Writer">Windows Live Writer</a>, but <a href="https://jekyllrb.com/">Jekyll</a> had no support for that, and since I'd been doing web development for more than a decade already, raw HTML seemed like a reliable and durable alternative. I increasingly find that relying on skill and knowledge is a far more durable strategy than relying on technology.
</p>
<p>
For a decade I used <a href="https://www.sublimetext.com/">Sublime Text</a> to write articles, but over the years, I found it degrading in quality. I only used Sublime Text to author blog posts, so when I recently repaved my machine, I decided to see if I could do without it.
</p>
<p>
Since I was already using <a href="https://code.visualstudio.com/">Visual Studio Code</a> for much of my programming, I decided to give it a go for articles as well. It always takes time when you decide to move off a tool you've been used for a decade, but after some initial frustrations, I quickly found a new modus operandi.
</p>
<p>
One benefit of rocking the boat is that it prompts you to reassess the way you do things. Naturally, this happened here as well.
</p>
<h3 id="28218d2cd10945e0886bd528cf2d792f">
My quest for relative URLs <a href="#28218d2cd10945e0886bd528cf2d792f">#</a>
</h3>
<p>
I'd been using a few Sublime Text snippets to automate a few things, like the markup for the section heading you see above this paragraph. Figuring out how to replicate that snippet in Visual Studio Code wasn't too hard, but as I was already perusing <a href="https://code.visualstudio.com/docs/editor/userdefinedsnippets">the snippet documentation</a>, I started investigating other options.
</p>
<p>
One little annoyance I'd lived with for years was adding links to other articles on the blog.
</p>
<p>
While I write an article, I run the site on my local machine. When linking to other articles, I sometimes use the existing page address off the public site, and sometimes I just copy the page address from <code>localhost</code>. In both cases, I want the URL to be relative so that I can navigate the site even if I'm offline. I've written enough articles on planes or while travelling without internet that this is an important use case for me.
</p>
<p>
For example, if I want to link to the article <a href="/2023/01/02/adding-nuget-packages-when-offline">Adding NuGet packages when offline</a>, I want the URL to be <code>/2023/01/02/adding-nuget-packages-when-offline</code>, but that's not the address I get when I copy from the browser's address bar. Here, I get the full URL, with either <code>http://localhost:4000/</code> or <code>https://blog.ploeh.dk/</code> as the origin.
</p>
<p>
For years, I've been manually stripping the origin away, as well as the trailing <code>/</code>. Looking through the Visual Studio Code snippet documentation, however, I eyed an opportunity to automate that workflow.
</p>
<h3 id="f13d4bb7acf84183b9ac18088206f0ca">
Snippet <a href="#f13d4bb7acf84183b9ac18088206f0ca">#</a>
</h3>
<p>
I wanted a piece of editor automation that could modify a URL after I'd pasted it into the article. After a few iterations, I've settled on a <em>surround-with</em> snippet that works pretty well. It looks like this:
</p>
<p>
<pre><span style="color:#2e75b6;">"Make URL relative"</span>: {
<span style="color:#2e75b6;">"prefix"</span>: <span style="color:#a31515;">"urlrel"</span>,
<span style="color:#2e75b6;">"body"</span>: [ <span style="color:#a31515;">"${TM_SELECTED_TEXT/^(?:http(?:s?):\\/\\/(?:[^\\/]+))(.+)\\//$1/}"</span> ],
<span style="color:#2e75b6;">"description"</span>: <span style="color:#a31515;">"Make URL relative."</span>
}</pre>
</p>
<p>
Don't you just love regular expressions? Write once, scrutinise forever.
</p>
<p>
I don't want to go over all the details, because I've already forgotten most of them, but essentially this expression strips away the URL origin starting with either <code>http</code> or <code>https</code> until it finds the first slash <code>/</code>.
</p>
<p>
The thing that makes it useful, though, is the <code>TM_SELECTED_TEXT</code> variable that tells Visual Studio Code that this snippet works on <em>selected</em> text.
</p>
<p>
When I paste a URL into an <code>a</code> tag, at first nothing happens because no text is selected. I can then use <kbd>Shift</kbd> + <kbd>Alt</kbd> + <kbd>→</kbd> to expand the selection, at which point the Visual Studio Code lightbulb (<em>Code Action</em>) appears:
</p>
<p>
<img src="/content/binary/make-url-relative-screen-shot.png" alt="Screen shot of the make-URL-relative code snippet in action.">
</p>
<p>
Running the snippet removes the URL's origin, as well as the trailing slash, and I can move on to write the link text.
</p>
<h3 id="4e72443f63fe42e382e854fdc9a8d07a">
Conclusion <a href="#4e72443f63fe42e382e854fdc9a8d07a">#</a>
</h3>
<p>
After I started using Visual Studio Code to write blog posts, I've created a few custom snippets to support my authoring workflow. Most of them are fairly mundane, but the <em>make-URLs-relative</em> snippet took me a few iterations to get right.
</p>
<p>
I'm not expecting many of my readers to have this particular need, but I hope that this outline showcases the capabilities of Visual Studio Code snippets, and perhaps inspires you to look into creating custom snippets for your own purposes.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="8d84d3c52d134dcda81f7b63faccb58b">
<div class="comment-author"><a href="https://chamook.lol">Adam Guest</a> <a href="#8d84d3c52d134dcda81f7b63faccb58b">#</a></div>
<div class="comment-content">
<p>
Seems like a useful function to have, so I naturally wondered if I could <del>make it worse</del>
<ins>implement a similar function in Emacs</ins>.
</p>
<p>
Emacs lisp has support for regular expressions, only typically with a bunch of extra slashes
included, so I needed to figure out how to work with the currently selected text to get this to work.
The currently selected text is referred to as the "region" and by specifying <code>"r"</code> as a parameter
for the <code>interactive</code> call we can pass the start and end positions for the region directly to the function.
</p>
<p>
I came up with this rather basic function:
</p>
<p>
<pre>
(defun make-url-relative (start end)
"Converts the selected uri from an absolute url and converts it to a relative one.
This is very simple and relies on the url starting with http/https, and removes each character to the
first slash in the path"
(interactive "r")
(replace-regexp-in-region "http[s?]:\/\/.+\/" "" start end))
</pre>
</p>
<p>
With this function included in config somewhere: it can be called by selecting a url, and using <kbd>M-x</kbd>
<code>make-url-relative</code> (or assigned to a key binding as required)
</p>
<p>
I'm not sure if there's an already existing package for this functionality, but I hadn't really thought to look for it before
so thanks for the idea 😊
</p>
</div>
<div class="comment-date">2023-05-24 11:20 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Folders versus namespaces
https://blog.ploeh.dk/2023/05/15/folders-versus-namespaces
2023-05-15T06:01:00+00:00
Mark Seemann
<div id="post">
<p>
<em>What if you allow folder and namespace structure to diverge?</em>
</p>
<p>
I'm currently writing C# code with some first-year computer-science students. Since most things are new to them, they sometimes do things in a way that are 'not the way we usually do things'. As an example, teachers have instructed them to use namespaces, but apparently no-one have told them that the file folder structure has to mirror the namespace structure.
</p>
<p>
The compiler doesn't care, but as long as I've been programming in C#, it's been idiomatic to do it that way. There's even <a href="https://learn.microsoft.com/dotnet/fundamentals/code-analysis/style-rules/ide0130">a static code analysis rule</a> about it.
</p>
<p>
The first couple of times they'd introduce a namespace without a corresponding directory, I'd point out that they are supposed to keep those things in sync. One day, however, it struck me: What happens if you flout that convention?
</p>
<h3 id="dac7601856e94a4e88315cb2e80f74e5">
A common way to organise code files <a href="#dac7601856e94a4e88315cb2e80f74e5">#</a>
</h3>
<p>
Code scaffolding tools and wizards will often nudge you to organise your code according to technical concerns: Controllers, models, views, etc. I'm sure you've encountered more than one code base organised like this:
</p>
<p>
<img src="/content/binary/code-organised-by-tech-responsibility.png" alt="Code organised into folders like Controllers, Models, DataAccess, etc.">
</p>
<p>
You'll put all your Controller classes in the <em>Controllers</em> directory, and make sure that the namespace matches. Thus, in such a code base, the full name of the <code>ReservationsController</code> might be <code>Ploeh.Samples.Restaurants.RestApi.Controllers.ReservationsController</code>.
</p>
<p>
A common criticism is that this is the <em>wrong</em> way to organise the code.
</p>
<h3 id="ea3a6df18fa641638f18e0b554c1ee7c">
The problem with trees <a href="#ea3a6df18fa641638f18e0b554c1ee7c">#</a>
</h3>
<p>
The complaint that this is the wrong way to organise code implies that a correct way exists. I write about this in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>:
</p>
<blockquote>
<p>
Should you create a subdirectory for Controllers, another for Models, one for Filters, and so on? Or should you create a subdirectory for each feature?
</p>
<p>
Few people like my answer: <em>Just put all files in one directory.</em> Be wary of creating subdirectories just for the sake of 'organising' the code.
</p>
<p>
File systems are <em>hierarchies</em>; they are trees: a specialised kind of acyclic graph in which any two vertices are connected by exactly one path. Put another way, each vertex can have at most one parent. Even more bluntly: If you put a file in a hypothetical <code>Controllers</code> directory, you can't <em>also</em> put it in a <code>Calendar</code> directory.
</p>
</blockquote>
<p>
But what if you could?
</p>
<h3 id="bcdc2397846c45a9bcb3cea0e71cfa7a">
Namespaces disconnected from directory hierarchy <a href="#bcdc2397846c45a9bcb3cea0e71cfa7a">#</a>
</h3>
<p>
The code that accompanies <em>Code That Fits in Your Head</em> is organised as advertised: 65 files in a single directory. (Tests go in separate directories, though, as they belong to separate libraries.)
</p>
<p>
If you decide to ignore the convention that namespace structure should mirror folder structure, however, you now have a second axis of variability.
</p>
<p>
As an experiment, I decided to try that idea with the book's code base. The above screen shot shows the stereotypical organisation according to technical responsibility, after I moved things around. To be clear: This isn't how the book's example code is organised, but an experiment I only now carried out.
</p>
<p>
If you open the <code>ReservationsController.cs</code> file, however, I've now declared that it belongs to a namespace called <code>Ploeh.Samples.Restaurants.RestApi.Reservations</code>. Using Visual Studio's <em>Class View</em>, things look different from the <em>Solution Explorer:</em>
</p>
<p>
<img src="/content/binary/code-organised-by-namespace.png" alt="Code organised into namespaces according to feature: Calandar, Reservations, etc.">
</p>
<p>
Here I've organised the namespaces according to feature, rather than technical role. The screen shot shows the <em>Reservations</em> feature opened, while other features remain closed.
</p>
<h3 id="19763ef496cc424f8fd419bc4ad699d6">
Initial reactions <a href="#19763ef496cc424f8fd419bc4ad699d6">#</a>
</h3>
<p>
This article isn't a recommendation. It's nothing but an initial exploration of an idea.
</p>
<p>
Do I like it? So far, I think I still prefer flat directory structures. Even though this idea gives two axes of variability, you still have to make judgment calls. It's easy enough with Controllers, but where do you put cross-cutting concerns? Where do you put domain logic that seems to encompass everything else?
</p>
<p>
As an example, the code base that accompanies <em>Code That Fits in Your Head</em> is a multi-tenant system. Each restaurant is a separate tenant, but I've modelled restaurants as part of the domain model, and I've put that 'feature' in its own namespace. Perhaps that's a mistake; at least, I now have the code wart that I have to import the <code>Ploeh.Samples.Restaurants.RestApi.Restaurants</code> namespace to implement the <code>ReservationsController</code>, because its constructor looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ReservationsController</span>(
IClock <span style="font-weight:bold;color:#1f377f;">clock</span>,
IRestaurantDatabase <span style="font-weight:bold;color:#1f377f;">restaurantDatabase</span>,
IReservationsRepository <span style="font-weight:bold;color:#1f377f;">repository</span>)
{
Clock = clock;
RestaurantDatabase = restaurantDatabase;
Repository = repository;
}</pre>
</p>
<p>
The <code>IRestaurantDatabase</code> interface is defined in the <code>Restaurants</code> namespace, but the Controller needs it in order to look up the restaurant (i.e. tenant) in question.
</p>
<p>
You could argue that this isn't a problem with namespaces, but rather a code smell indicating that I should have organised the code in a different way.
</p>
<p>
That may be so, but then implies a deeper problem: Assigning files to hierarchies may not, after all, help much. It looks as though things are organised, but if the assignment of things to buckets is done without a predictable system, then what benefit does it provide? Does it make things easier to find, or is the sense of order mostly illusory?
</p>
<p>
I tend to still believe that this is the case. This isn't a nihilistic or defeatist position, but rather a realisation that order must arise from other origins.
</p>
<h3 id="305d423972e0422fa5ecdd04326cd132">
Conclusion <a href="#305d423972e0422fa5ecdd04326cd132">#</a>
</h3>
<p>
I was recently repeatedly encountering student code with a disregard for the convention that namespace structure should follow directory structure (or the other way around). Taking a cue from <a href="https://en.wikipedia.org/wiki/Kent_Beck">Kent Beck</a> I decided to investigate what happens if you forget about the rules and instead pursue what that new freedom might bring.
</p>
<p>
In this article, I briefly show an example where I reorganised a code base so that the file structure is according to implementation detail, but the namespace hierarchy is according to feature. Clearly, I could also have done it the other way around.
</p>
<p>
What if, instead of two, you have three organising principles? I don't know. I can't think of a third kind of hierarchy in a language like C#.
</p>
<p>
After a few hours reorganising the code, I'm not scared away from this idea. It might be worth to revisit in a larger code base. On the other hand, I'm still not convinced that forcing a hierarchy over a sophisticated software design is particularly beneficial.
</p>
<p>
<ins datetime="2023-05-30T12:22Z"><strong>P.S. 2023-05-30.</strong> This article is only a report on an experiment. For my general recommendation regarding code file organisation, see <a href="/2023/05/29/favour-flat-code-file-folders">Favour flat code file folders</a>.</ins>
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="6c209fa61ad34ef3aa8290b06a964aaf">
<div class="comment-author"><a href="http://github.com/m4rsh">Markus Schmits</a> <a href="#6c209fa61ad34ef3aa8290b06a964aaf">#</a></div>
<div class="comment-content">
<p>
Hi Mark,
<br> While reading your book "Code That Fits in Your Head", your latest blog entry caught my attention, as I am struggling in software development with similar issues.
<br> I find it hard, to put all classes into one project directory, as it feels overwhelming, when the number of classes increases.
<br> In the following, I would like to specify possible organising principles in my own words.
<p> <b> Postulations </b>
<br>- Folders should help the programmer (and reader) to keep the code organised
<br> - Namespaces should reflect the hierarchical organisation of the code base
<br> - Cross-cutting concerns should be addressed by modularity.
</p>
<p> <b> Definitions </b>
<br> 1. Folders
<br> - the allocation of classes in a project with similar technical concerns into folders should help the programmer in the first place, by visualising this similarity
<br> - the benefit lies just in the organisation, i.e. storage of code, not in the expression of hierarchy
</p>
<p>
2. Namespaces
<br> - expression of hierarchy can be achieved by namespaces, which indicate the relationship between allocated classes
<br> - classes can be organised in folders with same designation
<br> - the namespace designation could vary by concerns, although the classes are placed in same folders, as the technical concern of the class shouldn't affect the hierarchical organisation
</p>
<p>
3. Cross-cutting concerns
<br> - classes, which aren't related to a single task, could be indicated by a special namespace
<br> - they could be placed in a different folder, to signalize different affiliations
<br> - or even placed in a different assembly
</p>
<p>
<b> Summing up </b>
<br> A hierarchy should come by design. The organisation of code in folders should help the programmer or reader to grasp the file structure, not necessarily the program hierarchy.
<br>Folders should be a means, not an expression of design. Folders and their designations could change (or disappear) over time in development. Thus, explicit connection of namespace to folder designation seems not desirable, but it's not forbidden.
</p>
All views above are my own. Please let me know, what you think.
<p>
Best regards,
<br>Markus
</p>
</p>
</div>
<div class="comment-date">2023-05-18 19:13 UTC</div>
</div>
<div class="comment" id="3178e0d2d3494f7db7188ed455b78103">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#3178e0d2d3494f7db7188ed455b78103">#</a></div>
<div class="comment-content">
<p>
Markus, thank you for writing. You can, of course, organise code according to various principles, and what works in one case may not be the best fit in another case. The main point of this article was to suggest, as an idea, that folder hierarchy and namespace hierarchy doesn't <em>have</em> to match.
</p>
<p>
Based on reader reactions, however, I realised that I may have failed to clearly communicate my fundamental position, so I wrote <a href="/2023/05/29/favour-flat-code-file-folders">another article about that</a>. I do, indeed, favour flat folder hierarchies.
</p>
<p>
That is not to say that you can't have any directories in your code base, but rather that I'm sceptical that any such hierarchy addresses real problems.
</p>
<p>
For instance, you write that
</p>
<blockquote>
<p>
"Folders should help the programmer (and reader) to keep the code organised"
</p>
</blockquote>
<p>
If I focus on the word <em>should</em>, then I agree: Folders <em>should</em> help the programmer keep the code organised. In my view, then, it follows that if a tree structure does <em>not</em> assist in doing that, then that structure is of no use and should not be implemented (or abandoned if already in place).
</p>
<p>
I do get the impression from many people that they consider a directory tree vital to be able to navigate and understand a code base. What I've tried to outline in <a href="/2023/05/29/favour-flat-code-file-folders">my more recent article</a> is that I don't accept that as an undisputable axiom.
</p>
<p>
What I <em>do</em> find helpful as an organising principle is focusing on dependencies as a directed acyclic graph. Cyclic dependencies between objects is a main source of complexity. Keep dependency graphs directed and <a href="/2022/11/21/decouple-to-delete">make code easy to delete</a>.
</p>
<p>
Organising code files in a tree structure doesn't help achieve that goal. This is the reason I consider code folder hierarchies a red herring: Perhaps not explicitly detrimental to sustainability, but usually nothing but a distraction.
</p>
<p>
How, then, do you organise a large code base? I hope that I answer that question, too, in my more recent article <a href="/2023/05/29/favour-flat-code-file-folders">Favour flat code file folders</a>.
</p>
</div>
<div class="comment-date">2023-06-13 6:11 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Is cyclomatic complexity really related to branch coverage?
https://blog.ploeh.dk/2023/05/08/is-cyclomatic-complexity-really-related-to-branch-coverage
2023-05-08T05:38:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A genuine case of doubt and bewilderment.</em>
</p>
<p>
Regular readers of this blog may be used to its confident and opinionated tone. I write that way, not because I'm always convinced that I'm right, but because prose with too many caveats and qualifications tends to bury the message in verbose and circumlocutory ambiguity.
</p>
<p>
This time, however, I write to solicit feedback, and because I'm surprised to the edge of bemusement by a recent experience.
</p>
<h3 id="6ebe2eae2250441c918433ba40ce8e86">
Collatz sequence <a href="#6ebe2eae2250441c918433ba40ce8e86">#</a>
</h3>
<p>
Consider the following code:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Collatz</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<<span style="color:blue;">int</span>> <span style="color:#74531f;">Sequence</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">n</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (n < 1)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(n),
<span style="color:#a31515;">$"Only natural numbers allowed, but given </span>{n}<span style="color:#a31515;">."</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sequence</span> = <span style="color:blue;">new</span> List<<span style="color:blue;">int</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">current</span> = n;
<span style="font-weight:bold;color:#8f08c4;">while</span> (current != 1)
{
sequence.Add(current);
<span style="font-weight:bold;color:#8f08c4;">if</span> (current % 2 == 0)
current = current / 2;
<span style="font-weight:bold;color:#8f08c4;">else</span>
current = current * 3 + 1;
}
sequence.Add(current);
<span style="font-weight:bold;color:#8f08c4;">return</span> sequence;
}
}</pre>
</p>
<p>
As the names imply, the <code>Sequence</code> function calculates the <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">Collatz sequence</a> for a given natural number.
</p>
<p>
Please don't tune out if that sounds mathematical and difficult, because it really isn't. While the Collatz conjecture still evades mathematical proof, the sequence is easy to calculate and understand. Given a number, produce a sequence starting with that number and stop when you arrive at 1. Every new number in the sequence is based on the previous number. If the input is even, divide it by two. If it's odd, multiply it by three and add one. Repeat until you arrive at one.
</p>
<p>
The conjecture is that any natural number will produce a finite sequence. That's the unproven part, but that doesn't concern us. In this article, I'm only interested in the above code, which computes such sequences.
</p>
<p>
Here are few examples:
</p>
<p>
<pre>> Collatz.Sequence(1)
List<<span style="color:blue;">int</span>>(1) { 1 }
> Collatz.Sequence(2)
List<<span style="color:blue;">int</span>>(2) { 2, 1 }
> Collatz.Sequence(3)
List<<span style="color:blue;">int</span>>(8) { 3, 10, 5, 16, 8, 4, 2, 1 }
> Collatz.Sequence(4)
List<<span style="color:blue;">int</span>>(3) { 4, 2, 1 }</pre>
</p>
<p>
While there seems to be a general tendency for the sequence to grow as the input gets larger, that's clearly not a rule. The examples show that the sequence for <code>3</code> is longer than the sequence for <code>4</code>.
</p>
<p>
All this, however, just sets the stage. The problem doesn't really have anything to do with Collatz sequences. I only ran into it while working with a Collatz sequence implementation that looked a lot like the above.
</p>
<h3 id="08c0cb2794184e9da8b9f72e6c9ce985">
Cyclomatic complexity <a href="#08c0cb2794184e9da8b9f72e6c9ce985">#</a>
</h3>
<p>
What is the <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of the above <code>Sequence</code> function? If you need a reminder of how to count cyclomatic complexity, this is a good opportunity to take a moment to refresh your memory, count the number, and compare it with my answer.
</p>
<p>
Apart from the opportunity for exercise, it was a rhetorical question. The answer is <em>4</em>.
</p>
<p>
This means that we'd need <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">at least four unit test to cover all branches</a>. Right? Right?
</p>
<p>
Okay, let's try.
</p>
<h3 id="d10688e0a13241c7b7124a5ce8f063ef">
Branch coverage <a href="#d10688e0a13241c7b7124a5ce8f063ef">#</a>
</h3>
<p>
Before we start, let's make the ritual <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">denouncement of code coverage as a target metric</a>. The point isn't to reach 100% code coverage as such, but to <a href="/2018/11/12/what-to-test-and-not-to-test">gain confidence that you've added tests that cover whatever is important to you</a>. Also, the best way to do that is usually with TDD, which isn't the situation I'm discussing here.
</p>
<p>
The first branch that we might want to cover is the Guard Clause. This is easily addressed with an <a href="https://xunit.net/">xUnit.net</a> test:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ThrowOnInvalidInput</span>()
{
Assert.Throws<ArgumentOutOfRangeException>(() => Collatz.Sequence(0));
}</pre>
</p>
<p>
This test calls the <code>Sequence</code> function with <code>0</code>, which (in this context, at least) isn't a <a href="https://en.wikipedia.org/wiki/Natural_number">natural number</a>.
</p>
<p>
If you measure test coverage (or, in this case, just think it through), there are no surprises yet. One branch is covered, the rest aren't. That's 25%.
</p>
<p>
(If you use the <a href="https://learn.microsoft.com/dotnet/core/testing/unit-testing-code-coverage">free code coverage option for .NET</a>, it will surprisingly tell you that you're only at 16% branch coverage. It deems the cyclomatic complexity of the <code>Sequence</code> function to be 6, not 4, and 1/6 is 16.67%. Why it thinks it's 6 is not entirely clear to me, but Visual Studio agrees with me that the cyclomatic complexity is 4. In this particular case, it doesn't matter anyway. The conclusion that follows remains the same.)
</p>
<p>
Let's add another test case, and perhaps one that gives the algorithm a good exercise.
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Example</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = Collatz.Sequence(5);
Assert.Equal(<span style="color:blue;">new</span>[] { 5, 16, 8, 4, 2, 1 }, actual);
}</pre>
</p>
<p>
As expected, the test passes. What's the branch coverage now?
</p>
<p>
Try to think it through instead of relying exclusively on a tool. The algorithm isn't more complicated that you can emulate execution in your head, or perhaps with the assistance of a notepad. How many branches does it execute when the input is <code>5</code>?
</p>
<p>
Branch coverage is now 100%. (Even the <em>dotnet</em> coverage tool agrees, despite its weird cyclomatic complexity value.) All branches are exercised.
</p>
<p>
Two tests produce 100% branch coverage of a function with a cyclomatic complexity of 4.
</p>
<h3 id="7a4d9c5fbb8e4e94a52c61990565e38f">
Surprise <a href="#7a4d9c5fbb8e4e94a52c61990565e38f">#</a>
</h3>
<p>
That's what befuddles me. I thought that cyclomatic complexity and branch coverage were related. I thought, that the number of branches was a good indicator of the number of tests you'd need to cover all branches. I even wrote <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">an article to that effect</a>, and no-one contradicted me.
</p>
<p>
That, in itself, is no proof of anything, but the notion that the article presents seems to be widely accepted. I never considered it controversial, and the only reason I didn't cite anyone is that this seems to be 'common knowledge'. I wasn't aware of a particular source I could cite.
</p>
<p>
Now, however, it seems that it's wrong. Is it wrong, or am I missing something?
</p>
<p>
To be clear, I completely understand why the above two tests are sufficient to fully cover the function. I also believe that I fully understand why the cyclomatic complexity is 4.
</p>
<p>
I am also painfully aware that the above two tests in no way fully specify the Collatz sequence. That's not the point.
</p>
<p>
The point is that it's possible to cover this function with only two tests, despite the cyclomatic complexity being 4. That surprises me.
</p>
<p>
Is this a known thing?
</p>
<p>
I'm sure it is. I've long since given up discovering anything new in programming.
</p>
<h3 id="829b5f9b3e9449d1a023d2bacff5b58c">
Conclusion <a href="#829b5f9b3e9449d1a023d2bacff5b58c">#</a>
</h3>
<p>
I recently encountered a function that performed a Collatz calculation similar to the one I've shown here. It exhibited the same trait, and since it had no Guard Clause, I could fully cover it with a single test case. That function even had a cyclomatic complexity of 6, so you can perhaps imagine my befuddlement.
</p>
<p>
Is it wrong, then, that cyclomatic complexity suggests a minimum number of test cases in order to cover all branches?
</p>
<p>
It seems so, but that's new to me. I don't mind being wrong on occasion. It's usually an opportunity to learn something new. If you have any insights, please <a href="https://github.com/ploeh/ploeh.github.com#comments">leave a comment</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="02568f995d91432da540858644b61e89">
<div class="comment-author"><a href="http://github.com/neongraal">Struan Judd</a> <a href="#02568f995d91432da540858644b61e89">#</a></div>
<div class="comment-content">
<p>
My first thought is that the code looks like an unrolled recursive function, so perhaps if it's
refactored into a driver function and a "continuation passing style" it might make the cyclomatic
complexity match the covering tests.
</p>
<p>
So given the following:
<pre>public delegate void ResultFunc(IEnumerable<int> result);
public delegate void ContFunc(int n, ResultFunc result, ContFunc cont);
public static void Cont(int n, ResultFunc result, ContFunc cont) {
if (n == 1) {
result(new[] { n });
return;
}
void Result(IEnumerable<int> list) => result(list.Prepend(n));
if (n % 2 == 0)
cont(n / 2, Result, cont);
else
cont(n * 3 + 1, Result, cont);
}
public static IReadOnlyCollection<int> Continuation(int n) {
if (n < 1)
throw new ArgumentOutOfRangeException(
nameof(n),
$"Only natural numbers allowed, but given {n}.");
var output = new List<int>();
void Output(IEnumerable<int> list) => output = list.ToList();
Cont(n, Output, Cont);
return output;
}</pre>
</p>
<p>
I calculate the Cyclomatic complexity of <code>Continuation</code> to be <em>2</em> and <code>Step</code> to be <em>3</em>.
</p>
<p>
And it would seem you need 5 tests to properly cover the code, 3 for <code>Step</code> and 2 for <code>Continuation</code>.
</p>
<p>
But however you write the "n >=1" case for <code>Continuation</code> you will have to cover some of <code>Step</code>.
</p>
</div>
<div class="comment-date">2023-05-08 10:11 UTC</div>
</div>
<div class="comment" id="896f7e7c979144438a6e7f1a66dd72ea">
<div class="comment-author">Jeroen Heijmans <a href="#896f7e7c979144438a6e7f1a66dd72ea">#</a></div>
<div class="comment-content">
<p>
There is a relation between cyclomatic complexity and branches to cover, but it's not one of equality, cyclomatic
complexity is an upper bound for the number of branches. There's a nice example illustrating this in the
<a href="https://en.wikipedia.org/w/index.php?title=Cyclomatic_complexity#Implications_for_software_testing">Wikipedia
article on cyclomatic complexity</a> that explains this, as well as the relation with path coverage (for which
cyclomatic complexity is a lower bound).
</p>
</div>
<div class="comment-date">2023-05-08 15:03 UTC</div>
</div>
<div class="comment" id="b683f78855f8440389b973e24c88c253">
<div class="comment-author"><a href="https://github.com/bretthall">Brett Hall</a> <a href="#b683f78855f8440389b973e24c88c253">#</a></div>
<div class="comment-content">
<p>
I find cyclomatic complexity to be overly pedantic at times, and you will need four tests if you get really pedantic.
First, test the guard clause as you already did. Then, test with 1 in order to test the <pre>while</pre> loop body
not being run. Then, test with 2 in order to test that the <pre>while</pre> is executed, but we only hit the <pre>if</pre>
part of the <pre>if/else</pre>. Finally, test with 3 in order to hit the <pre>else</pre> inside of the <pre>while</pre>.
That's four tests where each test is only testing one of the branches (some tests hit more than one branch, but the
"extra branch" is already covered by another test). Again, this is being really pedantic and I wouldn't test this
function as laid out above (I'd probaby put in the test with 1, since it's an edge case, but otherwise test as you did).
</p>
<p>
I don't think there's a rigorous relationship between cyclomatic complexity and number of tests. In simple cases, treating
things as though the relationship exists can be helpful. But once you start having iterrelated branches in a function,
things get murky, and you may have to go to pedantic lengths in order to maintain the relationship. The same
thing goes for code coverage, which can be 100% even though you haven't actually tested all paths through your code if
there are multiple branches in the function that depend on each other.
</p>
</div>
<div class="comment-date">2023-05-08 15:30 UTC</div>
</div>
<div class="comment" id="61939b516c0e4c2caab7c6e8a3302595">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#61939b516c0e4c2caab7c6e8a3302595">#</a></div>
<div class="comment-content">
<p>
Thank you, all, for writing. I'm extraordinarily busy at the moment, so it'll take me longer than usual to respond. Rest assured, however, that I haven't forgotten.
</p>
</div>
<div class="comment-date">2023-05-11 12:42 UTC</div>
</div>
<div class="comment" id="e91eeb8bd09f446ab863f51ae30afad9">
<div class="comment-author"><a href="https://www.nikolamilekic.com">Nikola Milekic</a> <a href="#e91eeb8bd09f446ab863f51ae30afad9">#</a></div>
<div class="comment-content">
<p>
If we agree to the definition of cyclomatic complexity as the number of independent paths through a section of code, then the number of tests needed to cover that section <strong>must be</strong> the same per definition, <strong>if those tests are also independent</strong>. Independence is crucial here, and is also the main source of confusion. Both the <code>while</code> and <code>if</code> forks depend on the same variable (<code>current</code>), and so they are not independent.
</p>
<p>
The second test you wrote is similarly not independent, as it ends up tracing multiple paths through through <code>if</code>: odd for 5, and even for 16, 8, etc, and so ends up covering all paths. Had you picked 2 instead of 5 for the test, that would have been more independent, as it would not have traced the <code>else</code> path, requiring one additional test.
</p>
<p>
The standard way of computing cyclomatic complexity assumes independence, which simply is not possible in this case.
</p>
</div>
<div class="comment-date">2023-06-02 00:38 UTC</div>
</div>
<div class="comment" id="1fafb3fa289a415f9102dc8d6defc464">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#1fafb3fa289a415f9102dc8d6defc464">#</a></div>
<div class="comment-content">
<p>
Struan, thank you for writing, and please accept my apologies for the time it took me to respond. I agree with your calculations of cyclomatic complexity of your refactored code.
</p>
<p>
I agree with what you write, but you can't write a sentence like "however you write the "n >=1" case for [...] you will have to cover some of [..]" and expect me to just ignore it. To be clear, I agree with you in the particular case of the methods you provided, but you inspired me to refactor my code with that rule as a specific constraint. You can see the results in my new article <a href="/2023/06/12/collatz-sequences-by-function-composition">Collatz sequences by function composition</a>.
</p>
<p>
Thank you for the inspiration.
</p>
</div>
<div class="comment-date">2023-06-12 5:46 UTC</div>
</div>
<div class="comment" id="2878d9f87f90405aa64ed1d1400d8d2b">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#2878d9f87f90405aa64ed1d1400d8d2b">#</a></div>
<div class="comment-content">
<p>
Jeroen, thank you for writing, and please accept my apologies for the time it took me to respond. I should have read that Wikipedia article more closely, instead of just linking to it.
</p>
<p>
What still puzzles me is that I've been aware of, and actively used, cyclomatic complexity for more than a decade, and this distinction has never come up, and no-one has called me out on it.
</p>
<p>
As <a href="https://en.wikipedia.org/wiki/Ward_Cunningham#%22Cunningham's_Law%22">Cunningham's law</a> says, <em>the best way to get the right answer on the Internet is not to ask a question; it's to post the wrong answer.</em> Even so, I posted <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">Put cyclomatic complexity to good use</a> in 2019, and no-one contradicted it.
</p>
<p>
I don't mention this as an argument that I'm right. Obviously, I was wrong, but no-one told me. Have I had something in my teeth all these years, too?
</p>
</div>
<div class="comment-date">2023-06-12 6:35 UTC</div>
</div>
<div class="comment" id="da62194dabc947d0b3ecd7c4258b0e86">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#da62194dabc947d0b3ecd7c4258b0e86">#</a></div>
<div class="comment-content">
<p>
Brett, thank you for writing, and please accept my apologies for the time it took me to respond. I suppose that I failed to make my overall motivation clear. When doing proper test-driven development (TDD), one doesn't need cyclomatic complexity in order to think about coverage. When following the <a href="/2019/10/21/a-red-green-refactor-checklist">red-green-refactor checklist</a>, you only add enough code to pass all tests. With that process, cyclomatic complexity is rarely useful, and I tend to ignore it.
</p>
<p>
I do, however, often coach programmers in unit testing and TDD, and people new to the technique often struggle with basics. They add too much code, instead of the simplest thing that could possibly work, or they can't think of a good next test case to write.
</p>
<p>
When teaching TDD I sometimes suggest cyclomatic complexity as a metric to help decision-making. <em>Did we add more code to the System Under Test than warranted by tests? Is it okay to forgo writing a test of a one-liner with cyclomatic complexity of one?</em>
</p>
<p>
The metric is also useful in hybrid scenarios where you already have production code, and now you want to add <a href="https://en.wikipedia.org/wiki/Characterization_test">characterisation tests</a>: Which test cases should you <em>at least</em> write?
</p>
<p>
Another way to answer such questions is to run a code-coverage tool, but that often takes time. I find it useful to teach people about cyclomatic complexity, because it's a lightweight heuristic always at hand.
</p>
</div>
<div class="comment-date">2023-06-12 7:24 UTC</div>
</div>
<div class="comment" id="01b5af5ccab04843911cde37104c4a7c">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#01b5af5ccab04843911cde37104c4a7c">#</a></div>
<div class="comment-content">
<p>
Nikola, thank you for writing. The emphasis on independence is useful; I used compatible thinking in my new article <a href="/2023/06/12/collatz-sequences-by-function-composition">Collatz sequences by function composition</a>. By now, including the other comments to this article, it seems that we've been able to cover the problem better, and I, at least, feel that I've learned something.
</p>
<p>
I don't think, however, that the standard way of computing cyclomatic complexity assumes independence. You can easily compute the cyclomatic complexity of the above <code>Sequence</code> function, even though its branches aren't independent. Tooling such as Visual Studio seems to agree with me.
</p>
</div>
<div class="comment-date">2023-06-13 5:32 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Refactoring pure function composition without breaking existing tests
https://blog.ploeh.dk/2023/05/01/refactoring-pure-function-composition-without-breaking-existing-tests
2023-05-01T06:44:00+00:00
Mark Seemann
<div id="post">
<p>
<em>An example modifying a Haskell Gossiping Bus Drivers implementation.</em>
</p>
<p>
This is an article in an series of articles about the <a href="/2023/02/13/epistemology-of-interaction-testing">epistemology of interaction testing</a>. In short, this collection of articles discusses how to test the composition of <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>. While a pure function is <a href="/2015/05/07/functional-design-is-intrinsically-testable">intrinsically testable</a>, how do you test the composition of pure functions? As the introductory article outlines, I consider it mostly a matter of establishing confidence. With <a href="/2018/11/12/what-to-test-and-not-to-test">enough test coverage</a> you can be confident that the composition produces the desired outputs.
</p>
<p>
Keep in mind that if you compose pure functions into a larger pure function, the composition is still pure. This implies that you can still test it by supplying input and verifying that the output is correct.
</p>
<p>
Tests that exercise the composition do so by verifying observable behaviour. This makes them more robust to refactoring. You'll see an example of that later in this article.
</p>
<h3 id="32b583422c354d0f8468406f0486a762">
Gossiping bus drivers <a href="#32b583422c354d0f8468406f0486a762">#</a>
</h3>
<p>
I recently did the <a href="https://kata-log.rocks/gossiping-bus-drivers-kata">Gossiping Bus Drivers</a> kata in <a href="https://www.haskell.org/">Haskell</a>. At first, I added the tests suggested in the kata description.
</p>
<p>
<pre>{-# OPTIONS_GHC -Wno-type-defaults #-}
<span style="color:blue;">module</span> Main <span style="color:blue;">where</span>
<span style="color:blue;">import</span> GossipingBusDrivers
<span style="color:blue;">import</span> Test.HUnit
<span style="color:blue;">import</span> Test.Framework.Providers.HUnit (<span style="color:#2b91af;">hUnitTestToTests</span>)
<span style="color:blue;">import</span> Test.Framework (<span style="color:#2b91af;">defaultMain</span>)
<span style="color:#2b91af;">main</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
main = defaultMain $ hUnitTestToTests $ TestList [
<span style="color:#a31515;">"Kata examples"</span> ~: <span style="color:blue;">do</span>
(routes, expected) <-
[
([[3, 1, 2, 3],
[3, 2, 3, 1],
[4, 2, 3, 4, 5]],
Just 5),
([[2, 1, 2],
[5, 2, 8]],
Nothing)
]
<span style="color:blue;">let</span> actual = drive routes
<span style="color:blue;">return</span> $ expected ~=? actual
]</pre>
</p>
<p>
As I prefer them, these tests are <a href="/2018/04/30/parametrised-unit-tests-in-haskell">parametrised HUnit tests</a>.
</p>
<p>
The problem with those suggested test cases is that they don't provide enough confidence that an implementation is correct. In fact, I wrote this implementation to pass them:
</p>
<p>
<pre>drive routes = <span style="color:blue;">if</span> <span style="color:blue;">length</span> routes == 3 <span style="color:blue;">then</span> Just 5 <span style="color:blue;">else</span> Nothing</pre>
</p>
<p>
This is clearly incorrect. It just looks at the number of routes and returns a fixed value for each count. It doesn't look at the contents of the routes.
</p>
<p>
Even if you don't <a href="/2019/10/07/devils-advocate">try to deliberately cheat</a> I'm not convinced that these two tests are enough. You could <em>try</em> to write the correct implementation, but how do you know that you've correctly dealt with various edge cases?
</p>
<h3 id="0e40e3994772481f855b266452685852">
Helper function <a href="#0e40e3994772481f855b266452685852">#</a>
</h3>
<p>
The kata description isn't hard to understand, so while the suggested test cases seem insufficient, I knew what was required. Perhaps I could write a proper implementation without additional tests. After all, I was convinced that it'd be possible to do it with a <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of <em>1</em>, and since <a href="/2013/04/02/why-trust-tests">a test function also has a cyclomatic complexity of <em>1</em></a>, there's always that tension in test-driven development: Why write test code to exercise code with a cyclomatic complexity of <em>1?</em>.
</p>
<p>
To be clear: There are often good reasons to write tests even in this case, and this seems like one of them. <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">Cyclomatic complexity indicates a minimum number of test cases</a>, not necessarily a sufficient number.
</p>
<p>
Even though Haskell's type system is expressive, I soon found myself second-guessing the behaviour of various expressions that I'd experimented with. Sometimes I find GHCi (the Haskell <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>) sufficiently edifying, but in this case I thought that I might want to keep some test cases around for a helper function that I was developing:
</p>
<p>
<pre><span style="color:blue;">import</span> Data.List
<span style="color:blue;">import</span> <span style="color:blue;">qualified</span> Data.Map.Strict <span style="color:blue;">as</span> Map
<span style="color:blue;">import</span> Data.Map.Strict (<span style="color:#2b91af;">(!)</span>)
<span style="color:blue;">import</span> <span style="color:blue;">qualified</span> Data.Set <span style="color:blue;">as</span> Set
<span style="color:blue;">import</span> Data.Set (<span style="color:blue;">Set</span>)
<span style="color:#2b91af;">evaluateStop</span> <span style="color:blue;">::</span> (<span style="color:blue;">Functor</span> f, <span style="color:blue;">Foldable</span> f, <span style="color:blue;">Ord</span> k, <span style="color:blue;">Ord</span> a)
=> f (k, Set a) -> f (k, Set a)
evaluateStop stopsAndDrivers =
<span style="color:blue;">let</span> gossip (stop, driver) = Map.insertWith Set.union stop driver
gossipAtStops = <span style="color:blue;">foldl</span>' (<span style="color:blue;">flip</span> gossip) Map.empty stopsAndDrivers
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> (\(stop, _) -> (stop, gossipAtStops ! stop)) stopsAndDrivers</pre>
</p>
<p>
I was fairly confident that this function worked as I intended, but I wanted to be sure. I needed some examples, so I added these tests:
</p>
<p>
<pre><span style="color:#a31515;">"evaluateStop examples"</span> ~: <span style="color:blue;">do</span>
(stopsAndDrivers, expected) <- [
([(1, fromList [1]), (2, fromList [2]), (1, fromList [1])],
[(1, fromList [1]), (2, fromList [2]), (1, fromList [1])]),
([(1, fromList [1]), (2, fromList [2]), (1, fromList [2])],
[(1, fromList [1, 2]), (2, fromList [2]), (1, fromList [1, 2])]),
([(1, fromList [1, 2, 3]), (1, fromList [2, 3, 4])],
[(1, fromList [1, 2, 3, 4]), (1, fromList [1, 2, 3, 4])])
]
<span style="color:blue;">let</span> actual = evaluateStop stopsAndDrivers
<span style="color:blue;">return</span> $ fromList expected ~=? fromList actual</pre>
</p>
<p>
They do, indeed, pass.
</p>
<p>
The idea behind that <code>evaluateStop</code> function is to evaluate the state at each 'minute' of the simulation. The first line of each test case is the state before the drivers meet, and the second line is the <em>expected</em> state after all drivers have gossiped.
</p>
<p>
My plan was to use some sort of left fold to keep evaluating states until all information has disseminated to all drivers.
</p>
<h3 id="06888628b5da4b4bb76fc59812c019da">
Property <a href="#06888628b5da4b4bb76fc59812c019da">#</a>
</h3>
<p>
Since I have already extolled the virtues of property-based testing in this article series, I wondered whether I could add some properties instead of relying on examples. Well, I did manage to add one <a href="https://hackage.haskell.org/package/QuickCheck">QuickCheck</a> property:
</p>
<p>
<pre>testProperty <span style="color:#a31515;">"drive image"</span> $ \ (routes :: [NonEmptyList Int]) ->
<span style="color:blue;">let</span> actual = drive $ <span style="color:blue;">fmap</span> getNonEmpty routes
<span style="color:blue;">in</span> isJust actual ==>
<span style="color:blue;">all</span> (\i -> 0 <= i && i <= 480) actual</pre>
</p>
<p>
There's not much to talk about here. The property only states that the result of the <code>drive</code> function must be between <code>0</code> and <code>480</code>, if it exists.
</p>
<p>
Such a property could vacuously pass if <code>drive</code> always returns <code>Nothing</code>, so I used the <code>==></code> QuickCheck combinator to make sure that the property is actually exercising only the <code>Just</code> cases.
</p>
<p>
Since the <code>drive</code> function only returns a number, apart from verifying its <a href="https://en.wikipedia.org/wiki/Image_(mathematics)">image</a> I couldn't think of any other general property to add.
</p>
<p>
You can always come up with more specific properties that explicitly set up more constrained test scenarios, but is it worth it?
</p>
<p>
It's always worthwhile to stop and think. If you're writing a 'normal' example-based test, consider whether a property would be better. Likewise, if you're about to write a property, consider whether an example would be better.
</p>
<p>
'Better' can mean more than one thing. Preventing regressions is one thing, but making the code maintainable is another. If you're writing a property that is too complicated, it might be better to write a simpler example-based test.
</p>
<p>
I could definitely think of some complicated properties, but I found that more examples might make the test code easier to understand.
</p>
<h3 id="64eac896c8ea4127bacccb8ca01cf2fb">
More examples <a href="#64eac896c8ea4127bacccb8ca01cf2fb">#</a>
</h3>
<p>
After all that angst and soul-searching, I added a few more examples to the first parametrised test:
</p>
<p>
<pre><span style="color:#a31515;">"Kata examples"</span> ~: <span style="color:blue;">do</span>
(routes, expected) <-
[
([[3, 1, 2, 3],
[3, 2, 3, 1],
[4, 2, 3, 4, 5]],
Just 5),
([[2, 1, 2],
[5, 2, 8]],
Nothing),
([[1, 2, 3, 4, 5],
[5, 6, 7, 8],
[3, 9, 6]],
Just 13),
([[1, 2, 3],
[2, 1, 3],
[2, 4, 5, 3]],
Just 5),
([[1, 2],
[2, 1]],
Nothing),
([[1]],
Just 0),
([[2],
[2]],
Just 1)
]
<span style="color:blue;">let</span> actual = drive routes
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
The first two test cases are the same as before, and the last two are some edge cases I added myself. The middle three I adopted from <a href="https://dodona.ugent.be/en/activities/1792896126/">another page about the kata</a>. Since those examples turned out to be off by one, I did those examples on paper to verify that I understood what the expected value was. Then I adjusted them to my one-indexed results.
</p>
<h3 id="52417ab56528457891928ea551017f75">
Drive <a href="#52417ab56528457891928ea551017f75">#</a>
</h3>
<p>
The <code>drive</code> function now correctly implements the kata, I hope. At least it passes all the tests.
</p>
<p>
<pre><span style="color:#2b91af;">drive</span> <span style="color:blue;">::</span> (<span style="color:blue;">Num</span> b, <span style="color:blue;">Enum</span> b, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> b
drive routes =
<span style="color:green;">-- Each driver starts with a single gossip. Any kind of value will do, as
</span> <span style="color:green;">-- long as each is unique. Here I use the one-based index of each route,
</span> <span style="color:green;">-- since it fulfills the requirements.
</span> <span style="color:blue;">let</span> drivers = <span style="color:blue;">fmap</span> Set.singleton [1 .. <span style="color:blue;">length</span> routes]
goal = Set.unions drivers
stops = transpose $ <span style="color:blue;">fmap</span> (<span style="color:blue;">take</span> 480 . <span style="color:blue;">cycle</span>) routes
propagation =
<span style="color:blue;">scanl</span> (\ds ss -> <span style="color:blue;">snd</span> <$> evaluateStop (<span style="color:blue;">zip</span> ss ds)) drivers stops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> $ find (<span style="color:blue;">all</span> (== goal) . <span style="color:blue;">snd</span>) $ <span style="color:blue;">zip</span> [0 ..] propagation</pre>
</p>
<p>
Haskell code can be information-dense, and if you don't have an integrated development environment (IDE) around, this may be hard to read.
</p>
<p>
<code>drivers</code> is a list of sets. Each set represents the gossip that a driver knows. At the beginning, each only knows one piece of gossip. The expression initialises each driver with a <code>singleton</code> set. Each piece of gossip is represented by a number, simply going from <code>1</code> to the number of routes. Incidentally, this is also the number of drivers, so you can consider the number <code>1</code> as a placeholder for the gossip that driver <em>1</em> knows, and so on.
</p>
<p>
The <code>goal</code> is the union of all the gossip. Once every driver's knowledge is equal to the <code>goal</code> the simulation can stop.
</p>
<p>
Since <code>evaluateStop</code> simulates one stop, the <code>drive</code> function needs a list of stops to fold. That's the <code>stops</code> value. In the very first example, you have three routes: <code>[3, 1, 2, 3]</code>, <code>[3, 2, 3, 1]</code>, and <code>[4, 2, 3, 4, 5]</code>. The first time the drivers stop (after one minute), the stops are <code>3</code>, <code>3</code>, and <code>4</code>. That is, the first element in <code>stops</code> would be the list <code>[3, 3, 4]</code>. The next one would be <code>[1, 2, 2]</code>, then <code>[2, 3, 3]</code>, and so on.
</p>
<p>
My plan all along was to use some sort of left fold to repeatedly run <code>evaluateStop</code> over each minute. Since I need to produce a list of states, <code>scanl</code> was an appropriate choice. The lambda expression that I have to pass to it, though, is more complicated than I appreciate. We'll return to that in a moment.
</p>
<p>
The <code>drive</code> function can now index the <code>propagation</code> list by zipping it with the infinite list <code>[0 ..]</code>, <code>find</code> the first element where <code>all</code> sets are equal to the <code>goal</code> set, and then return that index. That produces the correct results.
</p>
<h3 id="2b84136e78154ebdb81654efabf3d987">
The need for a better helper function <a href="#2b84136e78154ebdb81654efabf3d987">#</a>
</h3>
<p>
As I already warned, I wasn't happy with the lambda expression passed to <code>scanl</code>. It looks complicated and arcane. Is there a better way to express the same behaviour? Usually, when confronted with a nasty lambda expression like that, in Haskell my first instinct is to see if <a href="https://pointfree.io/">pointfree.io</a> has a better option. Alas, <code>(((<span style="color:blue;">snd</span> <$>) . evaluateStop) .) . <span style="color:blue;">flip</span> <span style="color:blue;">zip</span></code> hardly seems an improvement. That <code>flip zip</code> expression to the right, however, suggests that it might help flipping the arguments to <code>evaluateStop</code>.
</p>
<p>
When I developed the <code>evaluateStop</code> helper function, I found it intuitive to define it over a list of tuples, where the first element in the tuple is the stop, and the second element is the set of gossip that the driver at that stop knows.
</p>
<p>
The tuples don't <em>have</em> to be in that order, though. Perhaps if I flip the tuples that would make the lambda expression more readable. It was worth a try.
</p>
<h3 id="ca808b5a5fa3473c9046704e4bcc7357">
Confidence <a href="#ca808b5a5fa3473c9046704e4bcc7357">#</a>
</h3>
<p>
Since this article is part of a small series about the epistemology of testing composed functions, let's take a moment to reflect on the confidence we may have in the <code>drive</code> function.
</p>
<p>
Keep in mind the goal of the kata: Calculate the number of minutes it takes for all gossip to spread to all drivers. There's a few tests that verify that; seven examples and a fairly vacuous QuickCheck property. Is that enough to be confident that the function is correct?
</p>
<p>
If it isn't, I think the best option you have is to add more examples. For the sake of argument, however, let's assume that the tests are good enough.
</p>
<p>
When summarising the tests that cover the <code>drive</code> function, I didn't count the three examples that exercise <code>evaluateStop</code>. Do these three test cases improve your confidence in the <code>drive</code> function? A bit, perhaps, but keep in mind that <em>the kata description doesn't mandate that function.</em> It's just a helper function I created in order to decompose the problem.
</p>
<p>
Granted, having tests that cover a helper function does, to a degree, increase my confidence in the code. I have confidence in the function itself, but that is largely irrelevant, because the problem I'm trying to solve is <em>not</em> implementing this particular function. On the other hand, my confidence in <code>evaluateStop</code> means that I have increased confidence in the code that calls it.
</p>
<p>
Compared to interaction-based testing, I'm not <em>testing</em> that <code>drive</code> calls <code>evaluateStop</code>, but I can still verify that this happens. I can just look at the code.
</p>
<p>
The composition is already there in the code. What do I gain from replicating that composition with <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> and <a href="http://xunitpatterns.com/Test%20Spy.html">Spies</a>?
</p>
<p>
It's not a breaking change if I decide to implement <code>drive</code> in a different way.
</p>
<p>
What gives me confidence when composing pure functions isn't that I've subjected the composition to an interaction-based test. Rather, it's that the function is composed from trustworthy components.
</p>
<h3 id="a414a1e6b9e947d2b4632fcea7b916cf">
Strangler <a href="#a414a1e6b9e947d2b4632fcea7b916cf">#</a>
</h3>
<p>
My main grievance with Stubs and Spies is that <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">they break encapsulation</a>. This may sound abstract, but is a real problem. This is the underlying reason that so many tests break when you refactor code.
</p>
<p>
This example code base, as other functional code that I write, avoids interaction-based testing. This makes it easier to refactor the code, as I will now demonstrate.
</p>
<p>
My goal is to change the <code>evaluateStop</code> helper function by flipping the tuples. If I just edit it, however, I'm going to (temporarily) break the <code>drive</code> function.
</p>
<p>
Katas typically result in small code bases where you can get away with a lot of bad practices that wouldn't work in a larger code base. To be honest, the refactoring I have in mind can be completed in a few minutes with a brute-force approach. Imagine, however, that we can't break compatibility of the <code>evaluateStop</code> function for the time being. Perhaps, had we had a larger code base, there were other code that depended on this function. At the very least, the tests do.
</p>
<p>
Instead of brute-force changing the function, I'm going to make use of the <a href="https://martinfowler.com/bliki/StranglerFigApplication.html">Strangler</a> pattern, as I've also described in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
Leave the existing function alone, and add a new one. You can typically copy and paste the existing code and then make the necessary changes. In that way, you break neither client code nor tests, because there are none.
</p>
<p>
<pre><span style="color:#2b91af;">evaluateStop'</span> <span style="color:blue;">::</span> (<span style="color:blue;">Functor</span> f, <span style="color:blue;">Foldable</span> f, <span style="color:blue;">Ord</span> k, <span style="color:blue;">Ord</span> a)
=> f (Set a, k) -> f (Set a, k)
evaluateStop' driversAndStops =
<span style="color:blue;">let</span> gossip (driver, stop) = Map.insertWith Set.union stop driver
gossipAtStops = <span style="color:blue;">foldl</span>' (<span style="color:blue;">flip</span> gossip) Map.empty driversAndStops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> (\(_, stop) -> (gossipAtStops ! stop, stop)) driversAndStops</pre>
</p>
<p>
In a language like C# you can often get away with overloading a method name, but Haskell doesn't have overloading. Since I consider this side-by-side situation to be temporary, I've appended a prime after the function name. This is a fairly normal convention in Haskell, I gather.
</p>
<p>
The only change this function represents is that I've swapped the tuple order.
</p>
<p>
Once you've added the new function, you may want to copy, paste and edit the tests. Or perhaps you want to do the tests first. During this process, make <a href="https://www.industriallogic.com/blog/whats-this-about-micro-commits/">micro-commits</a> so that you can easily suspend your 'refactoring' activity if something more important comes up.
</p>
<p>
Once everything is in place, you can change the <code>drive</code> function:
</p>
<p>
<pre><span style="color:#2b91af;">drive</span> <span style="color:blue;">::</span> (<span style="color:blue;">Num</span> b, <span style="color:blue;">Enum</span> b, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> b
drive routes =
<span style="color:green;">-- Each driver starts with a single gossip. Any kind of value will do, as
</span> <span style="color:green;">-- long as each is unique. Here I use the one-based index of each route,
</span> <span style="color:green;">-- since it fulfills the requirements.
</span> <span style="color:blue;">let</span> drivers = <span style="color:blue;">fmap</span> Set.singleton [1 .. <span style="color:blue;">length</span> routes]
goal = Set.unions drivers
stops = transpose $ <span style="color:blue;">fmap</span> (<span style="color:blue;">take</span> 480 . <span style="color:blue;">cycle</span>) routes
propagation =
<span style="color:blue;">scanl</span> (\ds ss -> <span style="color:blue;">fst</span> <$> evaluateStop' (<span style="color:blue;">zip</span> ds ss)) drivers stops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> $ find (<span style="color:blue;">all</span> (== goal) . <span style="color:blue;">snd</span>) $ <span style="color:blue;">zip</span> [0 ..] propagation</pre>
</p>
<p>
Notice that the type of <code>drive</code> hasn't change, and neither has the behaviour. This means that although I've changed the composition (the <em>interaction</em>) no tests broke.
</p>
<p>
Finally, once I moved all code over, I deleted the old function and renamed the new one to take its place.
</p>
<h3 id="6be4ec66c14c4adbbb8bcb60307c7ba3">
Was it all worth it? <a href="#6be4ec66c14c4adbbb8bcb60307c7ba3">#</a>
</h3>
<p>
At first glance, it doesn't look as though much was gained. What happens if I eta-reduce the new lambda expression?
</p>
<p>
<pre><span style="color:#2b91af;">drive</span> <span style="color:blue;">::</span> (<span style="color:blue;">Num</span> b, <span style="color:blue;">Enum</span> b, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> b
drive routes =
<span style="color:green;">-- Each driver starts with a single gossip. Any kind of value will do, as
</span> <span style="color:green;">-- long as each is unique. Here I use the one-based index of each route,
</span> <span style="color:green;">-- since it fulfills the requirements.
</span> <span style="color:blue;">let</span> drivers = <span style="color:blue;">fmap</span> Set.singleton [1 .. <span style="color:blue;">length</span> routes]
goal = Set.unions drivers
stops = transpose $ <span style="color:blue;">fmap</span> (<span style="color:blue;">take</span> 480 . <span style="color:blue;">cycle</span>) routes
propagation = <span style="color:blue;">scanl</span> (((<span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> . evaluateStop) .) . <span style="color:blue;">zip</span>) drivers stops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> $ find (<span style="color:blue;">all</span> (== goal) . <span style="color:blue;">snd</span>) $ <span style="color:blue;">zip</span> [0 ..] propagation</pre>
</p>
<p>
Not much better. I can now fit the <code>propagation</code> expression on a single line of code and still stay within a <a href="/2019/11/04/the-80-24-rule">80x24 box</a>, but that's about it. Is <code>((<span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> . evaluateStop) .) . <span style="color:blue;">zip</span></code> more readable than what we had before?
</p>
<p>
Hardly, I admit. I might consider reverting, and since I've been <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">using Git tactically</a>, I have that option.
</p>
<p>
If I hadn't tried, though, I wouldn't have known.
</p>
<h3 id="5bf92cabf46e4ec9a6a41b51dd29e876">
Conclusion <a href="#5bf92cabf46e4ec9a6a41b51dd29e876">#</a>
</h3>
<p>
When composing one pure function with another, how can you test that the outer function correctly calls the inner function?
</p>
<p>
By the same way that you test any other pure function. The only way you can observe whether a pure function works as intended is to compare its actual output to the output you expect its input to produce. How it arrives at that output is irrelevant. It could be looking up all results in a big table. As long as the result is correct, the function is correct.
</p>
<p>
In this article, you saw an example of how to test a composed function, as well as how to refactor it without breaking tests.
</p>
<p>
<strong>Next:</strong> <a href="/2023/06/19/when-is-an-implementation-detail-an-implementation-detail">When is an implementation detail an implementation detail?</a>
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
Are pull requests bad because they originate from open-source development?
https://blog.ploeh.dk/2023/04/24/are-pull-requests-bad-because-they-originate-from-open-source-development
2023-04-24T06:08:00+00:00
Mark Seemann
<div id="post">
<p>
<em>I don't think so, and at least find the argument flawed.</em>
</p>
<p>
Increasingly I come across a quote that goes like this:
</p>
<blockquote>
<p>
Pull requests were invented for open source projects where you want to gatekeep changes from people you don't know and don't trust to change the code safely.
</p>
</blockquote>
<p>
If you're wondering where that 'quote' comes from, then read on. I'm not trying to stand up a straw man, but I had to do a bit of digging in order to find the source of what almost seems like a <a href="https://en.wikipedia.org/wiki/Meme">meme</a>.
</p>
<h3 id="c347774c419941a9987c74c95b6f91cd">
Quote investigation <a href="#c347774c419941a9987c74c95b6f91cd">#</a>
</h3>
<p>
The quote is usually attributed to <a href="https://www.davefarley.net/">Dave Farley</a>, who is a software luminary that <a href="https://www.goodreads.com/review/show/4812673890">I respect tremendously</a>. Even with the attribution, the source is typically missing, but after asking around, <a href="https://twitter.com/MitjaBezensek/status/1626165418296590336">Mitja Bezenšek pointed me in the right direction</a>.
</p>
<p>
The source is most likely a video, from which I've transcribed a longer passage:
</p>
<blockquote>
<p>
"Pull requests were invented to gatekeep access to open-source projects. In open source, it's very common that not everyone is given free access to changing the code, so contributors will issue a pull request so that a trusted person can then approve the change.
</p>
<p>
"I think this is really bad way to organise a development team.
</p>
<p>
"If you can't trust your team mates to make changes carefully, then your version control system is not going to fix that for you."
</p>
<footer><cite><a href="https://youtu.be/UQrlEXU6RM8">Dave Farley</a></cite></footer>
</blockquote>
<p>
I've made an effort to transcribe as faithfully as possible, but if you really want to be sure what Dave Farley said, watch the video. The quote comes twelve minutes in.
</p>
<h3 id="60fb5776c68b464d9ae77ad601e8c99b">
My biases <a href="#60fb5776c68b464d9ae77ad601e8c99b">#</a>
</h3>
<p>
I agree that the argument sounds compelling, but I find it flawed. Before I proceed to put forward my arguments I want to make my own biases clear. Arguing against someone like Dave Farley is not something I take lightly. As far as I can tell, he's worked on systems more impressive than any I can showcase. I also think he has more industry experience than I have.
</p>
<p>
That doesn't necessarily make him right, but on the other hand, why should you side with me, with my less impressive résumé?
</p>
<p>
My objective is not to attack Dave Farley, or any other person for that matter. My agenda is the argument itself. I do, however, find it intellectually honest to cite sources, with the associated risk that my argument may look like a personal attack. To steelman my opponent, then, I'll try to put my own biases on display. To the degree I'm aware of them.
</p>
<p>
I prefer pull requests over pair and ensemble programming. I've tried all three, and I do admit that real-time collaboration has obvious advantages, but I find pairing or ensemble programming exhausting.
</p>
<p>
Since <a href="https://www.goodreads.com/review/show/440837121">I read <em>Quiet</em></a> a decade ago, I've been alert to the introspective side of my personality. Although I agree with <a href="http://www.exampler.com/about/">Brian Marick</a> that one should <a href="https://podcast.oddly-influenced.dev/episodes/not-a-ted-talk-relevant-results-from-psychology">be wary of understanding personality traits as destiny</a>, I mostly prefer solo activities.
</p>
<p>
Increasingly, since I became self-employed, I've arranged my life to maximise the time I can work from home. The exercise regimen I've chosen for myself is independent of other people: I run, and lift weights at home. You may have noticed that I like writing. I like reading as well. And, hardly surprising, I prefer writing code in splendid isolation.
</p>
<p>
Even so, I find it perfectly possible to have meaningful relationships with other people. After all, I've been married to the same woman for decades, my (mostly) grown kids haven't fled from home, and I have friends that I've known for decades.
</p>
<p>
In a toot that I can no longer find, Brian Marick asked (and I paraphrase from memory): <em>If you've tried a technique and didn't like it, what would it take to make you like it?</em>
</p>
<p>
As a self-professed introvert, social interaction <em>does</em> tire me, but I still enjoy hanging out with friends or family. What makes those interactions different? Well, often, there's good food and wine involved. Perhaps ensemble programming would work better for me with a bottle of Champagne.
</p>
<p>
Other forces influence my preferences as well. I like the <a href="/2023/02/20/a-thought-on-workplace-flexibility-and-asynchrony">flexibility provided by asynchrony</a>, and similarly dislike having to be somewhere at a specific time.
</p>
<p>
Having to be somewhere also involves transporting myself there, which I also don't appreciate.
</p>
<p>
In short, I prefer pull requests over pairing and ensemble programming. All of that, however, is just my subjective opinion, and <a href="/2020/10/12/subjectivity">that's not an argument</a>.
</p>
<h3 id="c83cc60f53e049edbdae29dca4402563">
Counter-examples <a href="#c83cc60f53e049edbdae29dca4402563">#</a>
</h3>
<p>
The above tirade about my biases is <em>not</em> a refutation of Dave Farley's argument. Rather, I wanted to put my own blind spots on display. If you suspect me of <a href="https://en.wikipedia.org/wiki/Motivated_reasoning">motivated reasoning</a>, that just might be the case.
</p>
<p>
All that said, I want to challenge the argument.
</p>
<p>
First, it includes an appeal to <em>trust</em>, which is <a href="/2023/03/20/on-trust-in-software-development">a line of reasoning with which I don't agree</a>. You can't trust your colleagues, just like you can't trust yourself. A code review serves more purposes than keeping malicious actors out of the code base. It also helps catch mistakes, security issues, or misunderstandings. It can also improve shared understanding of common goals and standards. Yes, this is <em>also</em> possible with other means, such as pair or ensemble programming, but from that, it doesn't follow that code reviews <em>can't</em> do that. They can. I've lived that dream.
</p>
<p>
If you take away the appeal to trust, though, there isn't much left of the argument. What remains is essentially: <em>Pull requests were invented to solve a particular problem in open-source development. Internal software development is not open source. Pull requests are bad for internal software development.</em>
</p>
<p>
That an invention was done in one context, however, doesn't preclude it from being useful in another. Git was invented to address an open-source problem. Should we stop using Git for internal software development?
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Solar_cell">Solar panels were originally developed for satellites and space probes</a>. Does that mean that we shouldn't use them on Earth?
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Global_Positioning_System">GPS was invented for use by the US military</a>. Does that make civilian use wrong?
</p>
<h3 id="a30f67d73f0e488aac23dccb723370f8">
Are pull requests bad? <a href="#a30f67d73f0e488aac23dccb723370f8">#</a>
</h3>
<p>
I find the original <em>argument</em> logically flawed, but if I insist on logic, I'm also obliged to admit that my <a href="/ref/predicate-logic">possible-world counter-examples</a> don't prove that pull requests are good.
</p>
<p>
Dave Farley's claim may still turn out to be true. Not because of the argument he gives, but perhaps for other reasons.
</p>
<p>
I think I understand where the dislike of pull requests come from. As they are often practised, pull requests can sit for days with no-one looking at them. This creates unnecessary delays. If this is the only way you know of working with pull requests, no wonder you don't like them.
</p>
<p>
<a href="/2021/06/21/agile-pull-requests">I advocate a more agile workflow for pull requests</a>. I consider that congruent with <a href="/2023/01/23/agilean">my view on agile development</a>.
</p>
<h3 id="e0d3bb23dda34ae98241ff2bad442794">
Conclusion <a href="#e0d3bb23dda34ae98241ff2bad442794">#</a>
</h3>
<p>
Pull requests are often misused, but they don't have to be. On the other hand, that's just my experience and subjective preference.
</p>
<p>
Dave Farley has argued that pull requests are a bad way to organise a development team. I've argued that the argument is logically flawed.
</p>
<p>
The question remains unsettled. I've attempted to refute one particular argument, and even if you accept my counter-examples, pull requests may still be bad. Or good.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="bdd051fb26464bdbbc056ddea07712d5">
<div class="comment-author"><a href="https://cwb.dk/">Casper Weiss Bang</a> <a href="#bdd051fb26464bdbbc056ddea07712d5">#</a></div>
<div class="comment-content">
<p>
Another important angle, for me, is that pull requests are not merely code review. It can also be a way of enforcing a variety of automated checks, i.e. running tests or linting etc. This enforces quality too - so I'd argue to use pull requests even if you don't do peer review (I do on my hobby projects atleast, for the exact reasons you mentioned in <a href="https://blog.ploeh.dk/2023/03/20/on-trust-in-software-development/">On trust in software development</a> - I don't trust myself to be perfect.)
</p>
</div>
<div class="comment-date">2023-04-26 10:26 UTC</div>
</div>
<div class="comment" id="9b022dd663d34feba170006de5b66af4">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#9b022dd663d34feba170006de5b66af4">#</a></div>
<div class="comment-content">
<p>
Casper, thank you for writing. Indeed, other readers have made similar observations on other channels (Twitter, Mastodon). That, too, can be a benefit.
</p>
<p>
In order to once more steel-man 'the other side', they'd probably say that you can run automated checks in your Continuous Delivery pipeline, and halt it if automated checks fail.
</p>
<p>
When done this way, it's useful to be able to also run the same tests on your dev box. I consider that a good practice anyway.
</p>
</div>
<div class="comment-date">2023-04-28 14:49 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.
A restaurant example of refactoring from example-based to property-based testing
https://blog.ploeh.dk/2023/04/17/a-restaurant-example-of-refactoring-from-example-based-to-property-based-testing
2023-04-17T06:37:00+00:00
Mark Seemann
<div id="post">
<p>
<em>A C# example with xUnit.net and FsCheck.</em>
</p>
<p>
This is the second comprehensive example that accompanies the article <a href="/2023/02/13/epistemology-of-interaction-testing">Epistemology of interaction testing</a>. In that article, I argue that in a code base that leans toward functional programming (FP), property-based testing is a better fit than interaction-based testing. In this example, I will show how to refactor realistic <a href="/2019/02/18/from-interaction-based-to-state-based-testing">state-based tests</a> into (state-based) property-based tests.
</p>
<p>
The <a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">previous article</a> showed a <a href="https://en.wikipedia.org/wiki/Minimal_reproducible_example">minimal and self-contained example</a> that had the advantage of being simple, but the disadvantage of being perhaps too abstract and unrelatable. In this article, then, I will attempt to show a more realistic and concrete example. It actually doesn't start with interaction-based testing, since it's already written in the style of <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a>. On the other hand, it shows how to refactor from concrete example-based tests to property-based tests.
</p>
<p>
I'll use the online restaurant reservation code base that accompanies my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<h3 id="e7aaa6310292411ab830de17f5906777">
Smoke test <a href="#e7aaa6310292411ab830de17f5906777">#</a>
</h3>
<p>
I'll start with a simple test which was, if I remember correctly, the second test I wrote for this code base. It was a smoke test that I wrote to drive a <a href="https://wiki.c2.com/?WalkingSkeleton">walking skeleton</a>. It verifies that if you post a valid reservation request to the system, you receive an HTTP response in the <code>200</code> range.
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservation</span>()
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> ReservationDto
{
At = DateTime.Today.AddDays(778).At(19, 0)
.ToIso8601DateTimeString(),
Email = <span style="color:#a31515;">"katinka@example.com"</span>,
Name = <span style="color:#a31515;">"Katinka Ingabogovinanana"</span>,
Quantity = 2
};
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">response</span> = <span style="color:blue;">await</span> api.PostReservation(expected);
response.EnsureSuccessStatusCode();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="color:blue;">await</span> response.ParseJsonContent<ReservationDto>();
Assert.Equal(expected, actual, <span style="color:blue;">new</span> ReservationDtoComparer());
}</pre>
</p>
<p>
Over the lifetime of the code base, I embellished and edited the test to reflect the evolution of the system as well as my understanding of it. Thus, when I wrote it, it may not have looked exactly like this. Even so, I kept it around even though other, more detailed tests eventually superseded it.
</p>
<p>
One characteristic of this test is that it's quite concrete. When I originally wrote it, I hard-coded the date and time as well. Later, however, <a href="/2021/01/11/waiting-to-happen">I discovered that I had to make the time relative to the system clock</a>. Thus, as you can see, the <code>At</code> property isn't a literal value, but all other properties (<code>Email</code>, <code>Name</code>, and <code>Quantity</code>) are.
</p>
<p>
This test is far from abstract or data-driven. Is it possible to turn such a test into a property-based test? Yes, I'll show you how.
</p>
<p>
A word of warning before we proceed: Tests with concrete, literal, easy-to-understand examples are valuable as programmer documentation. A person new to the code base can peruse such tests and learn about the system. Thus, this test is <em>already quite valuable as it is</em>. In a real, living code base, I'd prefer leaving it as it is, instead of turning it into a property-based test.
</p>
<p>
Since it's a simple and concrete test, on the other hand, it's easy to understand, and thus also a a good place to start. Thus, I'm going to refactor it into a property-based test; not because I think that you should (I don't), but because I think it'll be easy for you, the reader, to follow along. In other words, it's a good introduction to the process of turning a concrete test into a property-based test.
</p>
<h3 id="9dabcfae9e284a0ab9748cf817f4b2f9">
Adding parameters <a href="#9dabcfae9e284a0ab9748cf817f4b2f9">#</a>
</h3>
<p>
This code base already uses <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> so it makes sense to stick to that framework for property-based testing. While it's written in <a href="https://fsharp.org/">F#</a> you can use it from C# as well. The easiest way to use it is as a parametrised test. This is possible with the <a href="https://www.nuget.org/packages/FsCheck.Xunit">FsCheck.Xunit</a> glue library. In fact, as I refactor the <code>PostValidReservation</code> test, it'll look much like the <a href="https://github.com/AutoFixture/AutoFixture">AutoFixture</a>-driven tests from <a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">the previous article</a>.
</p>
<p>
When turning concrete examples into properties, it helps to consider whether literal values are representative of an equivalence class. In other words, is that particular value important, or is there a wider set of values that would be just as good? For example, why is the test making a reservation 778 days in the future? Why not 777 or 779? Is the value <em>778</em> important? Not really. What's important is that the reservation is in the future. How far in the future actually isn't important. Thus, we can replace the literal value <code>778</code> with a parameter:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservation</span>(PositiveInt <span style="font-weight:bold;color:#1f377f;">days</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> ReservationDto
{
At = DateTime.Today.AddDays((<span style="color:blue;">int</span>)days).At(19, 0)
.ToIso8601DateTimeString(),
<span style="color:green;">// The rest of the test...</span></pre>
</p>
<p>
Notice that I've replaced the literal value <code>778</code> with the method parameter <code>days</code>. The <code>PositiveInt</code> type is a type from FsCheck. It's a wrapper around <code>int</code> that guarantees that the value is positive. This is important because we don't want to make a reservation in the past. The <code>PositiveInt</code> type is a good choice because it's a type that's already available with FsCheck, and the framework knows how to generate valid values. Since it's a wrapper, though, the test needs to unwrap the value before using it. This is done with the <code>(int)days</code> cast.
</p>
<p>
Notice, also, that I've replaced the <code>[Fact]</code> attribute with the <code>[Property]</code> attribute that comes with FsCheck.Xunit. This is what enables FsCheck to automatically generate test cases and feed them to the test method. You can't always do this, as you'll see later, but when you can, it's a nice and succinct way to express a property-based test.
</p>
<p>
Already, the <code>PostValidReservation</code> test method is 100 test cases (the FsCheck default), rather than one.
</p>
<p>
What about <code>Email</code> and <code>Name</code>? Is it important for the test that these values are exactly <em>katinka@example.com</em> and <em>Katinka Ingabogovinanana</em> or might other values do? The answer is that it's not important. What's important is that the values are valid, and essentially any non-null string is. Thus, we can replace the literal values with parameters:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservation</span>(
PositiveInt <span style="font-weight:bold;color:#1f377f;">days</span>,
StringNoNulls <span style="font-weight:bold;color:#1f377f;">email</span>,
StringNoNulls <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> ReservationDto
{
At = DateTime.Today.AddDays((<span style="color:blue;">int</span>)days).At(19, 0)
.ToIso8601DateTimeString(),
Email = email.Item,
Name = name.Item,
Quantity = 2
};
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">response</span> = <span style="color:blue;">await</span> api.PostReservation(expected);
response.EnsureSuccessStatusCode();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="color:blue;">await</span> response.ParseJsonContent<ReservationDto>();
Assert.Equal(expected, actual, <span style="color:blue;">new</span> ReservationDtoComparer());
}</pre>
</p>
<p>
The <code>StringNoNulls</code> type is another FsCheck wrapper, this time around <code>string</code>. It ensures that FsCheck will generate no null strings. This time, however, a cast isn't possible, so instead I had to pull the wrapped string out of the value with the <code>Item</code> property.
</p>
<p>
That's enough conversion to illustrate the process.
</p>
<p>
What about the literal values <em>19</em>, <em>0</em>, or <em>2?</em> Shouldn't we parametrise those as well? While we could, that takes a bit more effort. The problem is that with these values, any old positive integer isn't going to work. For example, the number <em>19</em> is the hour component of the reservation time; that is, the reservation is for 19:00. Clearly, we can't just let FsCheck generate any positive integer, because most integers aren't going to work. For example, <em>5</em> doesn't work because it's in the early morning, and the restaurant isn't open at that time.
</p>
<p>
Like other property-based testing frameworks FsCheck has an API that enables you to constrain value generation, but it doesn't work with the type-based approach I've used so far. Unlike <code>PositiveInt</code> there's no <code>TimeBetween16And21</code> wrapper type.
</p>
<p>
You'll see what you can do to control how FsCheck generates values, but I'll use another test for that.
</p>
<h3 id="80844424b40a48f4931c78d91d865323">
Parametrised unit test <a href="#80844424b40a48f4931c78d91d865323">#</a>
</h3>
<p>
The <code>PostValidReservation</code> test is a high-level smoke test that gives you an idea about how the system works. It doesn't, however, reveal much about the possible variations in input. To drive such behaviour, I wrote and evolved the following state-based test:
</p>
<p>
<pre>[Theory]
[InlineData(1049, 19, 00, <span style="color:#a31515;">"juliad@example.net"</span>, <span style="color:#a31515;">"Julia Domna"</span>, 5)]
[InlineData(1130, 18, 15, <span style="color:#a31515;">"x@example.com"</span>, <span style="color:#a31515;">"Xenia Ng"</span>, 9)]
[InlineData( 956, 16, 55, <span style="color:#a31515;">"kite@example.edu"</span>, <span style="color:blue;">null</span>, 2)]
[InlineData( 433, 17, 30, <span style="color:#a31515;">"shli@example.org"</span>, <span style="color:#a31515;">"Shanghai Li"</span>, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">days</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">hours</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">minutes</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">at</span> = DateTime.Now.Date + <span style="color:blue;">new</span> TimeSpan(days, hours, minutes, 0);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> Reservation(
<span style="color:blue;">new</span> Guid(<span style="color:#a31515;">"B50DF5B1-F484-4D99-88F9-1915087AF568"</span>),
at,
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name ?? <span style="color:#a31515;">""</span>),
quantity);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
}</pre>
</p>
<p>
This test gives more details, without exercising all possible code paths of the system. It's still a <a href="/2012/06/27/FacadeTest">Facade Test</a> that covers 'just enough' of the integration with underlying components to provide confidence that things work as they should. All the business logic is implemented by a class called <code>MaitreD</code>, which is covered by its own set of targeted unit tests.
</p>
<p>
While parametrised, this is still only four test cases, so perhaps you don't have sufficient confidence that everything works as it should. Perhaps, as I've outlined in <a href="/2023/02/13/epistemology-of-interaction-testing">the introductory article</a>, it would help if we converted it to an FsCheck property.
</p>
<h3 id="707d58026e914b708e6394b5d1d2abad">
Parametrised property <a href="#707d58026e914b708e6394b5d1d2abad">#</a>
</h3>
<p>
I find it safest to refactor this parametrised test to a property in a series of small steps. This implies that I need to keep the <code>[InlineData]</code> attributes around for a while longer, removing one or two literal values at a time, turning them into randomly generated values.
</p>
<p>
From the previous test we know that the <code>Email</code> and <code>Name</code> values are almost unconstrained. This means that they are trivial in themselves to have FsCheck generate. That change, in itself, is easy, which is good, because combining an <code>[InlineData]</code>-driven <code>[Theory]</code> with an FsCheck property is enough of a mouthful for one refactoring step:
</p>
<p>
<pre>[Theory]
[InlineData(1049, 19, 00, 5)]
[InlineData(1130, 18, 15, 9)]
[InlineData( 956, 16, 55, 2)]
[InlineData( 433, 17, 30, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">days</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">hours</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">minutes</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
Prop.ForAll(
(<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> r).ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">r</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">at</span> = DateTime.Now.Date + <span style="color:blue;">new</span> TimeSpan(days, hours, minutes, 0);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = r
.WithQuantity(quantity)
.WithDate(at);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
}).QuickCheckThrowOnFailure();
}</pre>
</p>
<p>
I've now managed to get rid of the <code>email</code> and <code>name</code> parameters, so I've also removed those values from the <code>[InlineData]</code> attributes. Instead, I've asked FsCheck to generate a valid reservation <code>r</code>, which comes with both valid <code>Email</code> and <code>Name</code>.
</p>
<p>
It turned out that this code base already had some custom generators in a static class called <code>Gens</code>, so I reused those:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<Email> Email =>
<span style="color:blue;">from</span> s <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<NonWhiteSpaceString>()
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Email(s.Item);
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<Name> Name =>
<span style="color:blue;">from</span> s <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<StringNoNulls>()
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Name(s.Item);
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<Reservation> Reservation =>
<span style="color:blue;">from</span> id <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<Guid>()
<span style="color:blue;">from</span> d <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<DateTime>()
<span style="color:blue;">from</span> e <span style="color:blue;">in</span> Email
<span style="color:blue;">from</span> n <span style="color:blue;">in</span> Name
<span style="color:blue;">from</span> q <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Reservation(id, d, e, n, q.Item);</pre>
</p>
<p>
As was also the case with <a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a> you typically use <a href="/2022/03/28/monads">syntactic sugar for monads</a> (which in C# is query syntax) to compose complex <a href="/2023/02/27/test-data-generator-monad">test data generators</a> from simpler generators. This enables me to generate an entire <code>Reservation</code> object with a single expression.
</p>
<h3 id="05d4d9e8c07b4162bd1b65347200456f">
Time of day <a href="#05d4d9e8c07b4162bd1b65347200456f">#</a>
</h3>
<p>
Some of the values (such as the reservation's name and email address) that are involved in the <code>PostValidReservationWhenDatabaseIsEmpty</code> test don't really matter. Other values are constrained in some way. Even for the reservation <code>r</code> the above version of the test has to override the arbitrarily generated <code>r</code> value with a specific <code>quantity</code> and a specific <code>at</code> value. This is because you can't just reserve any quantity at any time of day. The restaurant has opening hours and actual tables. Most likely, it doesn't have a table for 100 people at 3 in the morning.
</p>
<p>
This particular test actually exercises a particular restaurant called <code>Grandfather.Restaurant</code> (because it was the original restaurant that was <a href="https://en.wikipedia.org/wiki/Grandfather_clause">grandfathered in</a> when the system was expanded to a multi-tenant system). It opens at 16 and has the last seating at 21. This means that the <code>at</code> value has to be between 16 and 21. What's the best way to generate a <code>DateTime</code> value that satisfies this constraint?
</p>
<p>
You could, naively, ask FsCheck to generate an integer between these two values. You'll see how to do that when we get to the <code>quantity</code>. While that would work for the <code>at</code> value, it would only generate the whole hours <em>16:00</em>, <em>17:00</em>, <em>18:00</em>, etcetera. It would be nice if the test could also exercise times such as <em>18:30</em>, <em>20:45</em>, and so on. On the other hand, perhaps we don't want weird reservation times such as <em>17:09:23.282</em>. How do we tell FsCheck to generate a <code>DateTime</code> value like that?
</p>
<p>
It's definitely possible to do from scratch, but I chose to do something else. The following shows how test code and production code can co-exist in a symbiotic relationship. The main business logic component that deals with reservations in the system is a class called <code>MaitreD</code>. One of its methods is used to generate a list of time slots for every day. A user interface can use that list to populate a drop-down list of available times. The method is called <code>Segment</code> and can also be used as a data source for an FsCheck test data generator:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<TimeSpan> <span style="color:#74531f;">ReservationTime</span>(
Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>,
DateTime <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">slots</span> = restaurant.MaitreD
.Segment(date, Enumerable.Empty<Reservation>())
.Select(<span style="font-weight:bold;color:#1f377f;">ts</span> => ts.At.TimeOfDay);
<span style="font-weight:bold;color:#8f08c4;">return</span> Gen.Elements(slots);
}</pre>
</p>
<p>
The <code>Gen.Elements</code> function is an FsCheck combinator that randomly picks a value from a collection. This one, then, picks one of the <code>DataTime</code> values generated by <code>MaitreD.Segment</code>.
</p>
<p>
The <code>PostValidReservationWhenDatabaseIsEmpty</code> test can now use the <code>ReservationTime</code> generator to produce a time of day:
</p>
<p>
<pre>[Theory]
[InlineData(5)]
[InlineData(9)]
[InlineData(2)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
Prop.ForAll(
(<span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(Grandfather.Restaurant, today)
<span style="color:blue;">let</span> offset = TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (r, offset)).ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">t</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">at</span> = today + t.offset;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = t.r
.WithQuantity(quantity)
.WithDate(at);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
}).QuickCheckThrowOnFailure();
}</pre>
</p>
<p>
Granted, the test code is getting more and more busy, but there's room for improvement. Before I simplify it, though, I think that it's more prudent to deal with the remaining literal values.
</p>
<p>
Notice that the <code>InlineData</code> attributes now only supply a single value each: The <code>quantity</code>.
</p>
<h3 id="e551e156b8344bc0bd5379084bd8a7ed">
Quantity <a href="#e551e156b8344bc0bd5379084bd8a7ed">#</a>
</h3>
<p>
Like the <code>at</code> value, the <code>quantity</code> is constrained. It must be a positive integer, but it can't be larger than the largest table in the restaurant. That number, however, isn't that hard to find:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">maxCapacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);</pre>
</p>
<p>
The FsCheck API includes a function that generates a random number within a given range. It's called <code>Gen.Choose</code>, and now that we know the range, we can use it to generate the <code>quantity</code> value. Here, I'm only showing the test-data-generator part of the test, since the rest doesn't change that much. You'll see the full test again after a few more refactorings.
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">maxCapacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
Prop.ForAll(
(<span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> offset = TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, maxCapacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (r.WithQuantity(quantity), offset)).ToArbitrary(),</pre>
</p>
<p>
There are now no more literal values in the test. In a sense, the refactoring from parametrised test to property-based test is complete. It could do with a bit of cleanup, though.
</p>
<h3 id="53494d59981c4c32b0dbbd93aa857874">
Simplification <a href="#53494d59981c4c32b0dbbd93aa857874">#</a>
</h3>
<p>
There's no longer any need to pass along the <code>offset</code> variable, and the explicit <code>QuickCheckThrowOnFailure</code> also seems a bit redundant. I can use the <code>[Property]</code> attribute from FsCheck.Xunit instead.
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">maxCapacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
(<span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> at = today + TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, maxCapacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> r.WithQuantity(quantity).WithDate(at)).ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">expected</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(restaurant),
db);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
});
}</pre>
</p>
<p>
Compared to the initial version of the test, it has become more top-heavy. It's about the same size, though. The original version was 30 lines of code. This version is only 26 lines of code, but it is admittedly more information-dense. The original version had more 'noise' interleaved with the 'signal'. The new variation actually has a better separation of data generation and the test itself. Consider the 'actual' test code:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(restaurant),
db);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);</pre>
</p>
<p>
If we could somehow separate the data generation from the test itself, we might have something that was quite readable.
</p>
<h3 id="a23c68b065d140c588e30bd1db228879">
Extract test data generator <a href="#a23c68b065d140c588e30bd1db228879">#</a>
</h3>
<p>
The above data generation consists of a bit of initialisation and a query expression. Like all <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> it's easy to extract:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Gen<(Restaurant, Reservation)>
<span style="color:#74531f;">GenValidReservationForEmptyDatabase</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> at = today + TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, capacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (restaurant, r.WithQuantity(quantity).WithDate(at));
}</pre>
</p>
<p>
While it's quite specialised, it leaves the test itself small and readable:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
GenValidReservationForEmptyDatabase().ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">t</span> =>
{
var (<span style="font-weight:bold;color:#1f377f;">restaurant</span>, <span style="font-weight:bold;color:#1f377f;">expected</span>) = t;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(restaurant),
db);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db[restaurant.Id]);
});
}</pre>
</p>
<p>
That's not the only way to separate test and data generation.
</p>
<h3 id="39343db1a22c4d0c93dbfb74e3af6689">
Test as implementation detail <a href="#39343db1a22c4d0c93dbfb74e3af6689">#</a>
</h3>
<p>
The above separation refactors the data-generating expression to a private helper function. Alternatively you can keep all that FsCheck infrastructure code in the public test method and extract the test body itself to a private helper method:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">g</span> = <span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> at = today + TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, capacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (restaurant, r.WithQuantity(quantity).WithDate(at));
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
g.ToArbitrary(),
<span style="font-weight:bold;color:#1f377f;">t</span> => PostValidReservationWhenDatabaseIsEmptyImp(
t.restaurant,
t.Item2));
}</pre>
</p>
<p>
At first glance, that doesn't look like an improvement, but it has the advantage that the actual test method is now devoid of FsCheck details. If we use that as a yardstick for how decoupled the test is from FsCheck, this seems cleaner.
</p>
<p>
<pre>